Classically Inclined

October 5, 2012

Creating an index

Filed under: Research — lizgloyn @ 8:56 am
Tags: , ,

In some ways, this is an extremely dull post, and relevant to very few people. However, I recently had to create a list of index entries for a chapter that I am submitting to a collected volume. I have never had to put an index together before, so this was completely new and a bit scary for me! Given that this task involved quite a steep learning curve, I thought it was worth putting down the steps I went through to create an index, both for my future reference, for the reference of others, and to provide a forum for those more experienced at this sort of thing to tell me what I’ve missed!

Step One: put the whole document through Wordle. This immediately gives you the headline words that are super-important in your writing, from which you can extrapolate the must-have entries. (Some won’t need to go in – for instance, although ‘Seneca’ was obviously represented in big bold type, it didn’t really need to go into the index when the article title made it clear that the content dealt with Seneca, as indeed will the title of the book.) Put all the words into a spreadsheet (I use Excel, other spreadsheet programs are available), as this means you can automatically alphabetise.

Step Two: This was my big learning curve – write a macro in Word! Through the very kind assistance of Stephen Jenkin of the Classics Library, who took pity on me when I whinged about doing this on Twitter and went hunting through code fora, I discovered that it is indeed possible to write some code for a macro that will create a list of words in order of frequency used. Amazing. I have included the code Stephen sent me at the end of this post, so you can copy and paste it yourselves – it’s not bug free, but it generates the data you need. You can then add any other words that didn’t turn up on the Wordle for whatever reason to your master spreadsheet.

Step Three:  read through your article and add the page references to the spreadsheet for the terms you have listed. If you come to passages where you feel there should be a reference but there isn’t, add one. (This is the only explanation for the wonderful index reference I came across the other day, which read “patronage, this book is not about.”) This is the long boring manual bit.

Step Four: look back through your terms and see whether any of them look a bit similar or have the same sorts of page references next to them; combine them if necesssary.

Step Five: delete any words which don’t actually end up being helpful. In my initial survey of this piece, I think I added things like ‘brothers’ and ‘siblings’, neither of which ended up with any page references after them.

I am very open to suggestions about how to go about this more efficiently, more quickly or more cheerfully, but I suspect that indexing is one of those jobs that is always going to involve a large element of nose to grindstone. The important thing is to do justice to our scholarship by putting the effort in to doing it properly.

***

Appendix – Word macro code

As promised, some instructions and hefty code for putting a macro into Word to generate a document including a list of words in frequency order. This is entirely to the credit of Stephen Jenkin, to whom I am very, very grateful. (And I do hope that I’ve managed to preserve the formatting of his e-mail in the code…)

“In Word, click on File/Options/Customize Ribbon
Click the Developer tab on.

Click on the new Developed tab from the menu at the top
Click on Macros and then Create

A new macro-editing window will appear.
Paste the following into the box!

Sub WordFrequency()

         Dim SingleWord As String           'Raw word pulled from doc
         Const maxwords = 9000              'Maximum unique words allowed
         Dim Words(maxwords) As String      'Array to hold unique words
         Dim Freq(maxwords) As Integer      'Frequency counter for Unique Words
         Dim WordNum As Integer             'Number of unique words
         Dim ByFreq As Boolean              'Flag for sorting order
         Dim ttlwds As Long                 'Total words in the document
         Dim Excludes As String             'Words to be excluded
         Dim Found As Boolean               'Temporary flag
         Dim j, k, l, Temp As Integer       'Temporary variables
         Dim tword As String                '

         ' Set up excluded words
'         Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]"
         Excludes = ""
         Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "")
'        Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "")
         ' Find out how to sort
         ByFreq = True
         Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
         If Ans = "" Then End
         If UCase(Ans) = "WORD" Then
             ByFreq = False
         End If

         Selection.HomeKey Unit:=wdStory
         System.Cursor = wdCursorWait
         WordNum = 0
         ttlwds = ActiveDocument.Words.Count
         Totalwords = ActiveDocument.Words.Count

         ' Control the repeat
         For Each aword In ActiveDocument.Words
             SingleWord = Trim(aword)
             If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
             If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
             If Len(SingleWord) > 0 Then
                 Found = False
                 For j = 1 To WordNum
                     If Words(j) = SingleWord Then
                         Freq(j) = Freq(j) + 1
                         Found = True
                         Exit For
                     End If
                 Next j
                 If Not Found Then
                     WordNum = WordNum + 1
                     Words(WordNum) = SingleWord
                     Freq(WordNum) = 1
                 End If
                 If WordNum > maxwords - 1 Then
                     j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly)
                     Exit For
                 End If
             End If
             ttlwds = ttlwds - 1
             StatusBar = "Remaining: " & ttlwds & "     Unique: " & WordNum
         Next aword

         ' Now sort it into word order
         For j = 1 To WordNum - 1
             k = j
             For l = j + 1 To WordNum
                 If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l
             Next l
             If k <> j Then
                 tword = Words(j)
                 Words(j) = Words(k)
                 Words(k) = tword
                 Temp = Freq(j)
                 Freq(j) = Freq(k)
                 Freq(k) = Temp
             End If
             StatusBar = "Sorting: " & WordNum - j
         Next j

         ' Now write out the results
         tmpName = ActiveDocument.AttachedTemplate.FullName
         Documents.Add Template:=tmpName, NewTemplate:=False
         Selection.ParagraphFormat.TabStops.ClearAll
         With Selection
             For j = 1 To WordNum
                 .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf
             Next j
         End With
         ActiveDocument.Range.Select
         Selection.ConvertToTable
         Selection.Collapse wdCollapseStart
         ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
         ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
         ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences"
         ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment = wdAlignParagraphCenter
         ActiveDocument.Tables(1).Rows.Add
         ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Total words in Document"
         ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Totalwords
         ActiveDocument.Tables(1).Rows.Add
         ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document"
         ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum))
         System.Cursor = wdCursorNormal
      '   j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished")
     Selection.HomeKey wdStory

End Sub

Click on the save icon (disk).
Open your document
Click on Macros
Scroll down and choose WordFrequency

You have options of leaving out words from being counted, but it shouldn’t make a great deal of a difference.
I’d leave it to search by FREQ

And then it should bring up a Word document with the list of words in the doc ordered by frequency.”

1 Comment »

  1. I have created indices for books a couple of times, and I think you’re right that it’s one of those things that is laborious however you do it! But I agree entirely that it is worth doing it properly for the sake of scholarship.

    My method is very different – I rely entirely on a thorough knowledge of the content of the book and then go through page by page deciding what needs indexing and what doesn’t (I hadn’t thought of using Wordle or looking up word frequencies, but that’s a nice idea for a starting point). Sometimes I might search for a word on a digital version to make sure I’ve caught all instances, but on the whole I think of it not as indexing words but as indexing references to and discussions of particular concepts. As I create the list of indexed items, I try to keep the whole list in my head so that I can keep track of how I am indexing the book conceptually[1], but at the end inevitably it is necessary to merge or lose categories sometimes. I suppose the key way in which my method differs from yours is that I make up the list of items to index as I read through the manuscript, rather than starting with a list and then adding page numbers (/anchors[2]) to the entries. (My method is probably the more laborious! It took me at least three weeks, working full days, to index my edited book.)

    I think the other main issue to address is how you go about creating an index that can then be handled by a publisher, because that relies on creating index anchors so that if pagination changes then the index entries will also automatically change. CUP allow you to create the index by annotating the manuscript by hand, which is my preferred way of doing it but is quite old fashioned! You can also create an index using Microsoft Word’s index facility, for which there are probably online tutorials.

    [1] Actually, it is quite interesting how an index imposes a conceptual framework on a book retrospectively. I have indexed once for a book by another single author and once for a book that I was editing but that was multi-authored, but some time next year I should be indexing my own book and I expect that to be easier because the conceptual framework will presumably match up more closely with what I was thinking when I wrote the book!

    [2] This is probably obvious, but an anchor can be for a single use of a key word on a page, or it can be a running anchor for a discussion of a concept – the latter won’t necessarily end with the last use of the key word but might continue onto the next page (or further!).

    Comment by Pippa Steele — October 6, 2012 @ 2:08 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.