In some ways, this is an extremely dull post, and relevant to very few people. However, I recently had to create a list of index entries for a chapter that I am submitting to a collected volume. I have never had to put an index together before, so this was completely new and a bit scary for me! Given that this task involved quite a steep learning curve, I thought it was worth putting down the steps I went through to create an index, both for my future reference, for the reference of others, and to provide a forum for those more experienced at this sort of thing to tell me what I’ve missed!
Step One: put the whole document through Wordle. This immediately gives you the headline words that are super-important in your writing, from which you can extrapolate the must-have entries. (Some won’t need to go in – for instance, although ‘Seneca’ was obviously represented in big bold type, it didn’t really need to go into the index when the article title made it clear that the content dealt with Seneca, as indeed will the title of the book.) Put all the words into a spreadsheet (I use Excel, other spreadsheet programs are available), as this means you can automatically alphabetise.
Step Two: This was my big learning curve – write a macro in Word! Through the very kind assistance of Stephen Jenkin of the Classics Library, who took pity on me when I whinged about doing this on Twitter and went hunting through code fora, I discovered that it is indeed possible to write some code for a macro that will create a list of words in order of frequency used. Amazing. I have included the code Stephen sent me at the end of this post, so you can copy and paste it yourselves – it’s not bug free, but it generates the data you need. You can then add any other words that didn’t turn up on the Wordle for whatever reason to your master spreadsheet.
Step Three: read through your article and add the page references to the spreadsheet for the terms you have listed. If you come to passages where you feel there should be a reference but there isn’t, add one. (This is the only explanation for the wonderful index reference I came across the other day, which read “patronage, this book is not about.”) This is the long boring manual bit.
Step Four: look back through your terms and see whether any of them look a bit similar or have the same sorts of page references next to them; combine them if necesssary.
Step Five: delete any words which don’t actually end up being helpful. In my initial survey of this piece, I think I added things like ‘brothers’ and ‘siblings’, neither of which ended up with any page references after them.
I am very open to suggestions about how to go about this more efficiently, more quickly or more cheerfully, but I suspect that indexing is one of those jobs that is always going to involve a large element of nose to grindstone. The important thing is to do justice to our scholarship by putting the effort in to doing it properly.
Appendix – Word macro code
As promised, some instructions and hefty code for putting a macro into Word to generate a document including a list of words in frequency order. This is entirely to the credit of Stephen Jenkin, to whom I am very, very grateful. (And I do hope that I’ve managed to preserve the formatting of his e-mail in the code…)
“In Word, click on File/Options/Customize Ribbon
Click the Developer tab on.
Click on the new Developed tab from the menu at the top
Click on Macros and then Create
A new macro-editing window will appear.
Paste the following into the box!
Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim tword As String ' ' Set up excluded words ' Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" Excludes = "" Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Find out how to sort ByFreq = True Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count Totalwords = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(aword) If SingleWord < "A" Or SingleWord > "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) > 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum > maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l Next l If k <> j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences" ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment = wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Total words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Totalwords ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor = wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory End Sub
Click on the save icon (disk).
Open your document
Click on Macros
Scroll down and choose WordFrequency
You have options of leaving out words from being counted, but it shouldn’t make a great deal of a difference.
I’d leave it to search by FREQ
And then it should bring up a Word document with the list of words in the doc ordered by frequency.”