“a unique empirical and linguistically-informed approach to ... the nature and regularities of the surface word patterns of a text”

The Research and Development Unit for English Studies has over the last two decades taken a unique empirical and linguistically-informed approach to the testing of hypotheses about the nature and regularities of the surface word patterns of a text, and their applicability in automatically identifying and extracting conceptual content, or 'aboutness', word meaning and semantic equivalence from large textual databases. Projects have included:

Aviator Analysis of Verbal Interaction & Automated Text Retrieval
An automated system for the identification of new words and new uses of existing words. The software is designed as a series of filters, through which the textual data 'flows' at regular intervals, thus providing a diachronic view of linguistic event. [More]

ACRONYM Automatic Collocational Retrieval of 'Nyms'
An automated system to identify semantically-related pairs of words (or 'nyms'), based on the similarity of their collocational environments. The results are being used as a basis for a description of the textual thesaurus, and in document retrieval as an alternative search term facility, and as an automatic thesaurus generator. [More]

SEAGULL Summary Extraction Algorithm Generated Using Lexical Links
An automated summarisation system which produces cohesive summaries (abridgements) of texts by extracting key topic-bearing sentences. [More]

APRIL Analysis and Prediction of Innovation in the Lexicon
An automated system of classification for rare new words in text. Sample new word listings are available on our neologisms page. A neologism software demo is also available. [More]

SHARES System of Hypermatrix Analysis, Retrieval, Evaluation & Summarisation
An automated system for the retrieval of similar documents, based on the hypothesis that similar patterns of lexical repetition are sufficiently maintained across differently authored documents on similar topics. [More]

WebCorp The Web as a Corpus
A set of tools to access the web and treat its textual content as a linguistic resource. Demo versions of the original WebCorp tools are available and we are building a complete WebCorp Linguist's Search Engine to improve performance. Our latest work aims to introduce empirical text study at school level by tailoring WebCorpLSE to the requirements of the A-Level syllabus. [More]

Repulsion The investigation of an organising force in text
The metaphor of ‘attraction’ from Physics is established in linguistics. It characterises the situation whereby a word is not evenly or randomly distributed across texts, but is found close to its preferred word partners (or ‘collocates’) in certain textual positions. In this project, we are testing the hypothesis that there is another ‘force’, which we call ‘repulsion’, that operates on the construction of text in the opposite way. By repulsion, we mean the system of conventional language use which discourages certain pairs of words from occuring together. [More]

eMargin An online collaborative textual annotation resource
We are developing a web-based system to allow students to annotate electronic texts and share these annotations with each other. The system, called eMargin, is designed to offer a digital equivalent of the marginalia associated with the academic study of texts, from underlining and colour-coded highlighting, to notes and comments on particular parts of the page. [More]