The investigation of an organising force in text

The metaphor of 'attraction' from Physics is established in linguistics. It characterises the situation whereby a word is not evenly or randomly distributed across texts, but is found close to its preferred word partners (or 'collocates') in certain textual positions. In our previous work, the focus has been on the circumstances in which words significantly prefer each otherís company, whether in adjacent pairings (span of 1 word to the left and right) or discontinuous phrasal or grammatical frameworks (e.g. span of 4 words to the left and right). In the AVIATOR project we built a 'collocational profile' for each word and used statistics to identify the most significant collocates.

In this project, we are testing the hypothesis that there is another 'force', which we shall call 'repulsion', that operates on the construction of text in the opposite way. By repulsion, we mean the system of conventional language use which discourages certain pairs of words from occurring together; for instance, it is possible in English to say 'Happy Christmas', 'Merry Christmas' and 'Happy Birthday', but not 'Merry Birthday'. The goal of the study is to look at the reasons behind these and more complex examples; to establish whether and how consistently the force operates, and whether it holds two words at a measurable distance from each other.

Absence of attraction is not the same as repulsion. We wish to establish a measure to differentiate between the non-co-occurrence of two words simply because they have no particular association, and actual repulsion.

This project will reveal facts about an unexplored but fundamental and potentially exploitable aspect of language in use, and the results will be important to whatever extent the hypothesis proves true. The IT applications for this new measure will complement and supplement the use of collocation measures. While collocation can indicate which word is the normal (correct) choice for a given context, 'repulsion' will pick up on unusual (incorrect) choices of words, serving as a tool to identify errors and evaluate suggested choices in drafts by writers of text and international users of English.

The full set of tools developed for the Repulsion project is not available online but a demo version showing repulsion between any two sample words can be found here.

Acknowledgement: The Repulsion Project was funded by the EPSRC from 2006-2007 (Grant Reference EP/D502551/1).