The Impossible Task: Cross-Referencing the Unabridged

As I mentioned on the Twit Machine recently, I have been working on a very exciting project: a new edition of Webster’s Third New International Dictionary, Unabridged.

“About frickin’ time!” fans of the Third hollered in one thunderous voice, and with good reason: the Third was released in 1961. It has been updated by means of an Addenda Section once every seven or so years, but an A-Z revision has been long overdue. We will be the first people to tell you that, longingly, as we peer out from underneath the production schedule.

And so we’ve begun the long, slow work of revising and updating. There is a stately surrealism to stripping down and refurbishing of one of America’s most celebrated and controversial dictionaries, kind of like taking the Pope underwear-shopping. When you get right down to it, you are left there in your small mortality, looking at the boxer-briefs of something that has been revered and hallowed for longer than you’ve been on this earth, and that is unsettling.

Nonetheless, here I am, staring intently at the varicosities of the Third and doing my best to patch them up.

Over the years, I’ve been asked why we don’t just slap some new words into the Third while we’re mucking about with new Collegiate editions. Hell, it’s just data, my dictionary-loving friends would say. It’s just an entry. It’ll only take you two extra minutes.

I have discovered that it’s not just an entry, and it’s not just two extra minutes, because of something called “cross-reference.”

Every dictionary you use has rules about the words entered therein, and one of the basic rules of any decent dictionary is that you cannot use a word in the definitions, usage notes, or example sentences that is not defined somewhere, somehow in that very dictionary. That sounds sensible, but you’d be surprised how many discount dictionaries don’t follow this rule–and what a difficult rule it is to follow, even in this digital age. In order to make sure that this rule is followed, we have a whole group of editors whose job is to beat the track of the alphabet, hoovering up all the information they can about the words in this book, and making everything tidy.

I was recently pulled from doing some subject-specific defining and put on the ever expanding task of making sure new entries are entered properly into the data. Part of this involves some cross-reference work, but “not a lot,” as the Director of Editorial Operations put it. “Just a bit.”

Silly me, I took “just a bit” at face value. In fact, “just a bit” means “there’s quite a lot and you will only find and correct a little bit of it.”

My very first entry gave me trouble. There was a word in a quotation that looked odd. I don’t think that’s supposed to be hyphenated, I thought, and so I went to the Third. No, indeed, it was entered in the Third as a closed compound, and I patted myself on the back for being so observant. Mid-pat, I realized I then had to do something about that.

There are options available to the editor doing cross-reference, but none of them is easy. The simplest choice is to alter the quotation to omit the troublesome word. Of course, as luck would have it, this wasn’t possible in this case, as the word to be omitted was the verb of the sentence, and a verbless example sentence was certainly going to raise a few eyebrows when this new dictionary came out. Well, then, I’d just have to find another quotation to sub in. Off to the citation files, where I found the absolute perfect substitute. Oh, it was gorgeous: short, idiomatic, completely covering the contextual meaning and connotation of the word in question, and the author’s name made me giggle (last name: Butters). This was it. After running it through the cross-reference gauntlet, I discovered it used two words not entered in this dictionary.

The next option is to see if the compounding style of this word is going to change at all in the new edition. We base this on citational information, so a quick search of the database showed be that the hyphenated and closed compounds had roughly the same amount of use. I shoot an e-mail to the Director of Defining and ask him if he has any advice. His response is, “Look through the revision files. Quickly.” Because like all dictionaries, this one has a deadline and we will make many, many people (not least of whom, the Publisher) sad if we push it back.

The revision files yield many surprises, chief of which is that some of the entries in it are from editors who came and went 20 years ago–the Third has, let’s remember, been in need of revision for a long time–and their notes have been appended by successive generations of editors who are correcting or reiterating their point. (“Style was once open; now determinedly hyphenated. A. Editor, 1982.” “Style now closed; ignore previous note. B. Editor, 1986.” “Word is open compound. Ignore A. & B., they are morons. C. Editor, 1992.”) I open one notes file. It is several hundred pages long.

After some searching, I find a note for this entry that leads me to believe that the hyphenated compound will not be entered. I make an assortment of irritated editorial noises and, after opening the cit files again, start looking for a third replacement sentence. An hour has gone by and I have spent it on one quotation at one entry. The word I am agonizing over is not even the word I’m entering: it is peripheral, incidental. But when you are doing cross-reference, nothing is peripheral or incidental.

Some variation of this continues for the rest of the letter, then progressive batches, and the number of annoying e-mails I send to my colleagues skyrockets. I can almost hear the server groan when I hit “New Message” and begin my fourteenth e-mail of the day to one of the science editors. “Me again. What are you going to do with ‘thumb drive’? I’m sure you haven’t even given it a thought, but can you give it one for me in, say, the next ten minutes?” I send more e-mails to the Director of Defining. “Howdy. Do you have any thoughts on how to handle the expansion of ‘HIPAA’?” And again, later: “One more: can I edit ‘douche-canoe’ down to ‘douche …’ in this quotation for ‘bromantic,’ or will I have to enter a new sense of ‘canoe’? If I’m doing that, should I just enter ‘douche-canoe’?”

It’s not just a matter of hunting down compounding styles. There are the new entries that require other new entries, each of those requiring two new entries, one of which will require substantial revision to another four entries, two of which will require new etymologies. One medical entry requires that I re-open 9 letters for revision and ask our Pronunciation Editor for six new prons in letters he’d already done. It takes me four hours to enter all this into the file.

At one point, I spend time trying to find a better quotation for a word to avoid the dread hyphenated-but-not-entered-as-such compound, only to discover 30 minutes into my search that the hyphen in question is actually an end-of-line break, and so not a real hyphen at all. The only upside to this is that the quotation I can now retain was written by someone with another chortle-inducing name. We take joy where we can find it.

Every inquiry leads me down a garden path of more inquiry, until I am lost in the weeds and just want to lie down in the grass and sleep for many years. I’m in so many different letters at once, I can’t tell you where I am in the project. (Here the Publisher frowns.) And here is the most perverse thing of all: even with all the time I’m putting in making sure that all these entries are tidy, there is no way I will catch every cross-reference error. Words that I assume are entered are not; styles that I assume are fine will be changed; words will be dropped or modified during copyediting, setting off another string of cross-reference changes. When I try to explain what the cross-reference work is like to another general definer, I sum it up by saying, “Google ‘ping-pong balls, mousetraps, and nuclear chain reaction.'” The ping-pong balls are the entries. All those sprung, upended mousetraps are me.

That is why we have Cross-Reference, the stalwart department who does this for every damn book we publish. Cross-Reference consists of the sweetest people on the editorial floor, but make no mistake: they are brilliant in ways that blabbering dilettantes like me cannot possibly comprehend. Consider: I have only done cross-reference work digitally, but there are people in our Cross-Ref department who remember the days when they did this by hand–when checking on the proposed styling of a new entry involved a silent plod across the editorial floor, a short aerobics routine that involved carefully lifting and stacking galleys, and tens of thousands of index cards. At one point, I asked one of the Cross-Ref editors how they knew that a styling change would be made later in the alphabet. “Oh,” she said, “you just keep track. Most of it just sticks in there, in all those nooks and crannies in your mind.”

I considered, not for the first time, that I must I have a very smooth brain.

They not only catch mistakes, but are lightning fast. They have to be: by the time they get a finished dictionary, they usually only have a few weeks to do their work before the book is due at the printer’s, and the printer gets very cranky if we are late. When the defining work is done, everyone breathes a huge sigh of relief and we celebrate with doughnuts, but no one gives a thought to the tireless drudges who are still–quietly, cheerfully–making sure that we haven’t used “douche-canoe” in an entry without defining it. There is very little glory in lexicography, and where there is glory, definers and etymologists get it all. But Cross-Ref are the ones who actually deserve it.

So when you read a dictionary entry in the new unabridged and have to look up another word in said book, raise a glass to the masterful editors of Cross-Reference, and be very glad that I am not one of them.



Filed under lexicography, making word sausage

27 responses to “The Impossible Task: Cross-Referencing the Unabridged

  1. Very nice! And Phew. At first I thought I was out of a job! 🙂

    • korystamper

      Maria, I can say with authority that you will never, ever, ever be out of a job, especially while slobs like me work at MW. 🙂

      (Ladies and gentlemen, please meet and pay homage to one of the heroes of MW: Maria Sansalone, Cross-Ref.)

  2. I love your tweets and blogs. This one was particularly enlightening and enjoyable.

  3. And to think I used to envy lexicographers their glamorous work.

    When the new edition eventually arrives, I hope the publishers find a way to quote you on the jacket: “like taking the Pope underwear-shopping!” Out of context, sure, but an irresistible line for unwary casual shoppers.

  4. johnwcowan

    Every inquiry leads me down a garden path of more inquiry, until I am lost in the weeds and just want to lie down in the grass and sleep for many years.

    Y’know, you are channelling the great Cham again:

    “To deliberate whenever I doubted, to enquire whenever I was ignorant, would have protracted the undertaking without end, and, perhaps, without much improvement; […] one enquiry only gave occasion to another, that book referred to book, that to search was not always to find, and to find was not always to be informed.”

    But I must say that applying the Word Not In rule to quotations strikes me as insane. I mean, I looked up cham in the OED, and there in the third quotation, Richard Eden’s 1553 translation of Sebastian Münster, is Tartaria. WNI. It’s a proper name, and they don’t go in. (Well, except that the first quotation, from Mandeville, has Cathay, and that is in; go figure.)

    So I looked up sly, for whatever obscure psychological reasons, and the first quotation is from the Ormulum, and it’s “Her wass wiss filippe sleh & ȝæp. & haȝherr hunnte.” Quite a few WNIs there. I don’t even have a clue what “filippe” might be, but WNI for sure. Okay, Orm made up his own spellings (which is why he’s important today), so maybe that’s not the fairest example to pick on.

    After looking at for go figure and being gratified to find it, I then moved on in the OED to figure v. A nice modern quotation this time (by OED2 standards): “1826 B. Disraeli Vivian Grey I. ii. xiv. 206 On the door of one of the shabbiest houses in Jermyn-street, the name of Mr. Stapylton Toad for a long time figured.” Jermyn-street? Stapylton? WNI! WNI! WNI!

    Gaah. This is insane. Quotations are quotations. Who’s going to look up all the words in quotations? Words in definitions, that I can understand: some words just have to be defined in terms of harder words if they are to be defined at all (now I’m channeling Boswell). But quotations? Silly waste of time.

    • korystamper

      Well, the WNI rule doesn’t apply to proper names, so that helps tremendously. Likewise, taxonomic names don’t require entry, compounds whose meanings are covered by existing definitions are not entered, and etyma are defined within the etymology. I’d wager that OED editors would say that all those Middle English and Early Modern English spellings are, in fact, covered: they’re listed in the “spellings” section of the OED online. And the coverage rules for the OED may well be different since it’s a different critter than the Third.

      I will grant that it is insanity, but it’s not insanity merely for insanity’s sake. (There is some of that in lexicography, too.) Consider this quotation:

      <<Taylor took a slant pass from Doug Johnson, avoided a tackle and dove toward the pylon for the game’s first touchdown.>>

      I find this at the entry for “pylon” (and you will as well, when I finish the cross-ref on this entry and we post it). Shocking though this may be for an American, I actually know very little about football. I now know what “pylon” means here, since that’s the entry at which this quotation is found, but I have no idea what a slant pass is, so I’m going to look up “slant” to help me understand what in the wide, wide world of sports Taylor is doing here.

      Or how about this:

      <<Two-way exchanges between us are such that today, it may be as easy to savor kimchi and soju in Seattle as it is to grab a burger and beer in Seoul.>>

      I can almost guarantee–would almost bet on it–that people will read this quotation and look up “kimchi.” Then they will be very surprised that it is not a hamburger, since that’s the comparison drawn here.

      You are absolutely right that the heart and soul of cross-reference work is to make sure that all the words in a definition have coverage. But while we’re at it, it doesn’t hurt to make sure that we’ve covered words in quotations that may lead readers down a garden path of inquiry all their own.

      • I sort of agree with johnwcowan, though; I don’t need to understand what a slant pass is to know that the pylon is something Taylor could drive toward, but knowing what a slant pass is doesn’t do anything to help me understand what a pylon is. On the other hand, if Taylor crashed into the pylon beyond the goal line and broke his shoulder, I’d know a pylon is something I don’t want to crash into; from your example, it could be a flag, some sort of official, or a stripe on the ground. The slant pass has nothing to do with it.

        As far as the kimchi is concerned, I know from the quotation that it’s something I can savor, and whether I know that it’s not hamburger doesn’t matter to me; I can savor both kimchi and soju, and I’m guessing (since I don’t know what you’re defining here) that soju is something savory and maybe fast-food-like (since the comparison is with American fast food).
        On the other hand, if your quotation said somebody stared hopelessly into his soju, the quotation would be wasted on me completely.

        My greater concern is whether the quotation says anything that helps me grasp the word in a meaningful context or it’s just there for the sake of quotation. If it helps me with the context usage of the word, the words around it don’t matter much to me.

  5. Forget smooth brains and brains with nooks and crannies—how does this job not simply make your brain explode?

    • What makes a king out of a slave? Courage! What makes the flag on the mast to wave? Courage! What makes the elephant charge his tusk in the misty mist, or the dusky dusk? What makes the muskrat guard his musk? Courage! What makes the sphinx the seventh wonder? Courage! What makes the dawn come up like thunder? Courage! What makes the Hottentot so hot? What puts the “ape” in apricot? What have they got that I ain’t got? …

      • johnwcowan

        Y’know, that’s an interesting transitive use of charge there. Though I know one warrior (of the strictly D & D variety) who used to rush into battle screaming “Pay Cash! Pay Cash!”

  6. I have enjoyed reading your blog – it is entertaining and enlightening.

    I have some ignorant questions. If the correct hyphenation of a word is disputed, why not put both versions in the dictionary? Also, shouldn’t you be including the quotations verbatim?

    Also I’m somewhat baffled that your “perfect substitute” quotation includes words that are not in an unabridged dictionary…

    • korystamper

      Good questions! In some of our books, we only show the most common compounding style to save space. (This, of course, is subject to change.) And yes, quotations do go in verbatim–so the “perfect substitute” wasn’t eligible since it included two words that were not entered in the unabridged dictionary. Hence the continued search. Poor Mr. Butters.

  7. Pingback: 5 blogs you should be following, Vol. 2 « General Tso's Revenge

  8. This has nothing to do with the topic at hand, but it’s been bothering me for a long time, and since you’re the only M-W person I know, the question goes to you: why the devil does M-W cap “etesian”? It’s not based on a proper noun, it’s from a Greek word meaning ‘annual,’ and every other dictionary in the world (well, the AHD and OED, anyway) has it lowercase, the way the Great God Lexic intended. It doesn’t come up often enough that I have that peculiarity memorized, but when I run across it in text I’m editing (as just happened) and am about to lowercase it, a little warning bell goes off in my head and I look it up “just to be sure” and see that inexplicable “Etesian,” slam my hand down on my desk (startling the cat), and utter loud expletives that used to be unprintable. The only way you can truly relieve my spirit is by promising me that it will be lowercase in the next edition, but it would assuage my pain a tiny bit to have a rational explanation.

    • Kory Stamper

      FIVE YEARS TOO LATE, I am here as a balm unto Gilead. My many apologies to your long-suffering cat.

      I can’t promise it will be lowercase in the next edition, but I can perhaps give a half-assed and probably unsatisfactory explanation as to why it appears capped in our dictionaries: that’s how it’s been used historically, possibly because it was treated as a pseudo-proper noun. From the 17th and 18th centuries we have capped evidence for “the Etesian winds” and “the Etesian Winds,” and even when the fad for capitalizing Important Words in one’s Letters faded, “Etesian” held on to the uppercase “E” in defiance of the Great God Lexic. Modern use for it, such as it is, shows more capped use than not, but there is lowercase use creeping in.

      So there is hope for some sort of logical downcasing, but we’ll probably all be long gone by the time it happens. So the wind blows.

  9. This piece reminds me of the canteen scene in 1984 where Smith sits with Syme, and Syme talks about his work, on the 11th edition of the Newspeak dictionary. You know the bit where Syme enthusiastically describes working on “the 11th” and discusses the nuances and technicalities of publishing a dictionary, except of course his WNI problems would be the inverse of yours. Making sure that no quotes exist using words that were in the dictionary but are to be taken out.

    Fascinating piece, it’s all very obvious – after someone tells you about it.
    Thanks for blogging.

    • And not just quotes, it’s a basic rule for most books, not including any that covers a specialized field: “every word and sense used in the dictionary is covered by the dictionary” … That’s why we call it “cross-reference tracking,” we track words. Some days, it’s playful hide-and-seek, some days it’s Sherlock Holmes, on the case … ~ maria

      • Douglas Good

        It would be nice whenever headwords are removed to also remove corresponding example word citations. For instance, see combining form acanth- and simple word acanthocarpous. When acanthocarpous was dropped from the 2nd edition of MWU, the example word continues in MWU 3rd.

  10. I am thinking that the word “unabridged” does not mean what I imagine it to mean when you write “wasn’t eligible since it included two words that were not entered in the unabridged dictionary”. If, as defines it, unabridged means “complete” how could there be any words, let alone two, (two!), words missing!?

    • korystamper

      Ah, just because a dictionary is unabridged doesn’t mean it has all the words in it. Unabridged dictionaries have entry criteria like every other dictionary, they just cast a wider net. In this case, the words were too new to merit entry into the unabridged dictionary. Maybe they’ll be entered in a few years, but that’s not really a valid excuse when cross-referencing.

      The reality is that no unabridged dictionary out there has all the words in the language in it. If you pick one up that claims to, then it was written by sneaky, lying liars.

    • Just to add to what Kory said: “unabridged” does not mean “complete,” and does not say it does; it says “being the most complete of its class : not based on one larger <an unabridged dictionary>.” No dictionary could conceivably be “complete” in the sense of including all the words in English (a phrase that cannot even be defined); if it is unabridged, that simply means it is the largest available from that publisher (other, smaller, ones being abridged from it).

      And Kory, I’m still sitting here tapping my feet, waiting for an answer about “etesian”!

  11. Five years later, I’m still sitting here tapping my feet, waiting for an answer about “etesian”…

