The Historical Thesaurus of English – Co-fhaclair Eachdraidheil na Beurla

0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Email -- Pin It Share 0 Filament.io 0 Flares ×

‌Marc Alexander, Professor of English Linguistics, took over as the director of the Historical Thesaurus of English project in 2011, following in the footsteps of Professor Michael Samuels and Professor Christian J. Kay. Alexander has been working on the project for over a decade, helping to produce the printed Historical Thesaurus of the Oxford English Dictionary, and researching using its database. He is now in the midst of producing the second edition.

In 1963 Professor Samuels concluded that the English language required a historical thesaurus – an inventory of all word senses recorded across time and the evolution of their meanings – in order to better understand our language, culture, and history.  Samuels decided that he and his department would take on this mammoth task, which involved identifying and classifying the meaning of every known word from the last two millennia of English.

They began by using the Oxford English Dictionary, Anglo-Saxon dictionaries, and dictionaries of current English to compile slips of paper listing the word, its definition and the dates when each sense was in use. This wasn’t always straightforward because some words have had lots of different meanings over time – strike, for example has 256 meanings, and fall 206. Eventually, by the 1980s they had approximately 1.1 million slips of paper, each with a single word sense!

The next step involved categorising each of those pieces of paper into bundles based on what the words meant. For example, Alexander, on arrival at the Thesaurus office, was given three drawers full of words related to Mathematics to categorise (almost 7,000 words) and a desk next to a colleague who was wrestling with the words for ‘audible breathing’ (including sighing and snoring). The detailed structure of concepts then had to be arranged into a hierarchy so that, for example, the word ‘guilt’ can be found next to ‘wickedness’ and far away from unrelated meanings like ‘toaster’. The first edition of the Historical Thesaurus of the Oxford English Dictionary was published in 2009 under the direction of the late Professor Kay. It included 797,000 words, all in 236,000 categories of meaning. The second edition, which Alexander is responsible for, will include a further 100,000 words, mainly featuring recent developments in English (most particularly in the areas of computing and the Internet).

The Historical Thesaurus has now been integrated with the Oxford English Dictionary online, and it’s being used by psychologists, literary scholars, translators, publishers and novelists. For writers working on historical fiction, fantasy, or sci-fi the Thesaurus can help to ensure that the language used is authentic. The Pulitzer Prize-winning novelist Geraldine Brooks has said in interviews that the ‘amazing’ Historical Thesaurus is her ‘go-to resource’ and ‘the stuff of dreams’ for a writer; Philip Pullman said he ‘can hardly imagine any reference book more valuable’; Alexander McCall Smith described the Thesaurus as ‘momentous’ and bringing ‘endless pleasure’; and Melvyn Bragg called it ‘outstanding and indispensable and so much fun!’.

Alexander is also involved in the SAMUELS project, which is adding semantic annotations to text using the Thesaurus. This project will prove helpful because 69% of words in the English language have more than one meaning. That means that researchers encounter real problems if they are searching large volumes of text using key words. The ‘tagger’ that the SAMUELS project is developing allows researchers to search for a sense of a word’s meaning as opposed to just the word. The sheer amount of digital data in use is only growing, and researchers want more efficient searching capabilities and data retrieval. For instance, if you wanted to search in the records of the UK Parliament for the word ‘strike’, with its 256 possible meanings, the search results could overwhelm you (it is found almost 88,000 times in the Hansard debate record alone).

However, with each word in the project’s Hansard Corpus tagged with the Thesaurus hierarchy, an interface can instead present search results based on a word’s meaning (using concepts nearby in the Thesaurus, such as ‘unions,’ ‘industrial action,’ and ‘workplace’ to identify the sense of strike to do with industrial unrest, and not to do with hitting something, clocks announcing the hour, dismantling scenery, lighting a match, making a coin, finding oil, snakes attacking, and so on).

The tagger also allows us to narrow down the concepts and topics which MPs have discussed the most at any time; Alexander pointed out that the tagger uncovered that the least-discussed concept in all of Parliament over the past 200 years was the supernatural, which is fairly encouraging. The tagger will be a tool which is translatable to other English speaking Parliaments too.

If you’d like to find out more about the Thesaurus or the taggeryou can contact us at arts-ke@glasgow.ac.uk.

 

 

Post Author: Nicole Cassie

Leave a Reply