“Corpus Linguistics: past and present”

PDF versions of several papers can be accessed by following the links shown. Some articles are available as drafts or abstracts only. Many items are available at Amazon.com.

Matt Gee
Andrew Kehoe
Antoinette Renouf
Former Unit Staff
PhD Students


Matt Gee

Publication list can found on his School of English profile.

Back to top


Andrew Kehoe

Publication list can found on his School of English profile.

Back to top


Antoinette Renouf

1984 'Corpus Development at Birmingham University', in Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English Language Research,edited B Aarts, Jan, and Willem Meijs, Rodopi, Amsterdam, pp. 3-39.

1986 'The Elicitation of Spoken English', in English in Speech and Writing: Proceedings from the symposium on English in speech and writing, Uppsa la, Oct 5, 1984, edited by Tottie, Gunnel and Ingegard Backlund , Almqvist and Wiksell, Stockholm, pp. 177-197.

1987 'Lexical Resolution', in Corpus Linguistics and Beyond: Proceedings of the 7th International Conference on English Language Research on Computerised Corpora, edited by Willem Meijs, Rodopi, Amsterdam, pp. 121-131.

1987 'The Exploitation of a Computerised Corpus of English Text', in Actes du VIIIe Colloque du GERAS, edited by Michele Rivas, Centre de Recherche CERLACA, Universite de Paris-Dauphine, pp. 123-136. Also in Papers from the Conference of Departments of English in Finland, CDEF 86, eds. Nyysson en, Heikki, Kataja,Riitta, and Vesa Komulainen, Dept. of English, Oulu University, pp. 7-20.

1987 'Corpus Development', in Looking Up, edited by John McH. Sinclair, Collins, London, pp. 1-40.

1987 'Moving On', in Looking Up, edited by John McH. Sinclair, Collins, London, pp. 167-178.

1988 with John McH. Sinclair, 'A Lexical Syllabus for Language Learning', in Vocabulary and Language Teaching, by Carter Ronald A., and Michael J. McCarthy, Longman, Harlow, pp. 140-160.

1988 'Coding Metalanguage: Issues Raised in the Creation and Processing of Specialised Corpora', in Corpus Linguistics, Hard and Soft: Proceedings of the 8th ICAME Conference, edited by Ihalainen, Ossi, Kyto, Merja, and Matti Rissanen, Rodopi, Amsterdam, pp. 197-206.

1989 'Progress Report on Corpus Linguistics at Birmingham', in ICAME Journal, No.14, ed. Johansson, Stig, N.A.V.F., Bergen.

1990 ed. 'Aspects of Work in Corpus Linguistics at Birmingham University', in Proceedings from the Stockholm Conference on the Use of Computers in Language Research and Teaching, Sept. 1989. Stockholm Papers in English Language and Literature 6, Department of English, University of Stockholm, pp. 85-92.

1991 with John McH. Sinclair, 'Collocational Frameworks In English', in English Corpus Linguistics: Studies in Honour of Jan Svartvik, eds. Aijmer, Karin, and Bengt Altenberg, Longman, Harlow, pp. 128-143.

1992 'What Do You Think of That?: A Pilot Study of the Phraseology of the Core Words of English", in New Directions in English Language Corpora, ed. Leitner, Gerhard. (ed.), Mouton de Gruyters, Berlin, pp. 301-317.

1992 'The Establishment and Use of Text Corpora at Birmingham University', in Hermes Journal of Linguistics 1991, ed. Bergenholtz, Henning, Arhus Business School, Arhus.

1993 'A Word in Time: first findings from dynamic corpus investigation' in English Language Corpora: Design, Analysis and Exploitation, eds. Aarts, Jan, de Haan, Pieter, and Nelleke Oostdijk, Rodopi, Amsterdam, pp. 279-288.

1993 'Sticking to the text: a corpus linguist's view of language', in ASLIB Proceedings, Volume 45/5 , May 1993, pp. 131-136.

1993 'What the Linguist has to say to the Information Scientist', in The Journal of Document and Text Management, ed. Gibb, Forbes, vol. 1/2, 1993, pp. 173-190.

1993 'Making Sense of Text: Automated Approaches to Meaning Extraction', in Proceedings of 17th International Online Information Meeting, 7-9 December 1993, pp. 77-86.

1994 'Corpora and Historical Dictionaries', in Early Dictionary Databases, eds. Lancashire, Ian, & T. Russon Wooldridge, Univ. of Toronto, Oct 1-8, 1993, pp. 219-235.

1995 also in Informatique et Dictionnaires Anciens, in series Dictionnaire et Lexicographie, series ed. Quemada, Bernard, Didier Erudition, Paris, pp. 219-235.

1994 with Campanelli, Pamela, Channell, Joanna, McAulay, Liz, and Roger Thomas, Training: An exploration of the word and the concept with an analysis of the implications for survey design, Employment Department Research Series No. 30.

1995 with Collier, Alex, 'A System of Automatic Textual Abridgement', in Proceedings of AI'95, 15th International Conference, Language Engineering '95, Montpellier, June 27-30, 1995, pp. 395-407.

1996 with Baayen, Harald, 'Chronicling the Times: Productive Lexical Innovations in an English Newspaper', Language, 72.1, pp. 69-96.

1996 'Managing the Teaching of Corpus Linguistics', in Teaching and Language Corpora, eds. Anne. Wichmann, Fligelstone, Steve,McEnery, Tony and Gerry Knowles, Longman, Harlow, pp. 255-266.

1996 'Les Nyms: en qu ete du thesaurus des textes', in Lingvisticae Investigationes XX:1, Benjamins, Amsterdam, pp. 145-165.

1996 'The ACRONYM Project: Discovering the Textual Thesaurus', in Papers from English Language Research on Computerized Corpora (ICAME 16) , eds. Lancashire, Ian, Meyer, Charles & Carol Percy, Rodopi, Amsterdam, pp. 171-187.

1997 'Tools for the Diachronic Study of Historical Corpora', in To Explain the Present: Studies of the changing English language in honour of Matti Rissanen, eds Nevalainen, T, and L Kahlas-Tarkka, series Memoires de la Societe Neophilologique de Helsinki L11, Helsinki Univ. pp. 185-199.

1998 with Baayen, R. Harald, 'Aviating among the Hapax Legomena: morphological grammaticalisation in current British Newspaper English', in Explorations in Corpus Linguistics, ed. Renouf, Antoinette , Rodopi, Amsterdam, pp.

1998 with Pacey, Mike, and Alex Collier, 'Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora', in Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, 15-16 August 1998, ed. Charniak, Eugene, COLING-ACL, pp. 76-84.

1998 (editor) Explorations in Corpus Linguistics, Rodopi, Amsterdam.

2001 'Lexical Signals of Word Relations', Scott, Mike & Geoff Thompson (eds) (2001). Patterns of Text: in honour of Michael Hoey. Amsterdam/Philadelphia: John Benjamin Publishing Co. pp. 35-54.

2001 with Bauer, Laurie: 'Contextual Clues to Word-Meaning', International Journal of Corpus Linguistics, Vol. 5(2), Amsterdam/Philadelphia: John Benjamin Publishing Co. pp.231-258.

2001 with Bauer, Laurie, "A Corpus-Based Study of Compounding in English", in Journal of English Linguistics, 29(2), ed. Meyer, Charles F., University of Massachusetts at Boston, Sage Publications, pp. 100-123

2002 "The Time Dimension in Modern Corpus Linguistics", in Bernhard Kettemann & Georg Marko (eds.) Teaching and Learning by Doing Corpus Analysis. Papers from the 4th International Conference on Teaching and Learning Corpora, Graz, 19/24 July 2000, Amsterdam/Atlanta GA: Rodopi. pp. 27-41

2002 with Kehoe, A. WebCorp: Applying the Web to Linguistics and Linguistics to the Web. World Wide Web 2002 Conference, Honolulu, Hawaii, 7-11 May 2002.

2003 "WebCorp: providing a renewable data source for corpus linguists", in "Granger, S. & S. Petch-Tyson (eds.) Extending the scope of corpus-based research: new applications, new challenges, Amsterdam/Atlanta GA: Rodopi pp.39-58

2004 with Kehoe, Andrew and David Mezquiriz: "The Accidental Corpus: issues involved in extracting linguistic information from the Web", in Aijmer, Karin & Bengt Altenberg (eds.) Proceedings of 21st ICAME Conference, University of Gothenburg, May 22-26 2002, Amsterdam/Atlanta GA: Rodopi pp.

2003 Chinese translation of: `Lexical Signals of Word Relations", in Jin, G Dr: Window to the Computational Linguistics, Jisuan Yuyan Xueshichuang, Beijing, pp. 230-253

2003 with Bauer, Laurie: Chinese translation of: "A Corpus-Based Study of Compounding in English", in Jin, G Dr: Window to the Computational Linguistics , Jisuan Yuyan Xueshichuang, Beijing, pp. 254-282

2003 with Morley B. & A. Kehoe Linguistic Research with the XML/RDF aware WebCorp Tool in Proceedings of WWW2003, Budapest.

2004 Shall we Hors-d"Oeuvres? Uses and Misuses of Gallicisms in English in Laporte, Eric, Christian Leclère, Mireille Piot et Max Silberztein (eds.) Syntaxe, Lexique et lexique-Grammaire: Hommage à Maurice Gross, Lingvisticae Investigationes Supplementa 24, John Benjamin's Publishing Co., Amsterdam/Philadelphia, pp. 527-545

2004 with A. Kehoe. Textual Distraction as a Basis for Evaluating Automatic Summarisers, in Lino, M. T., Xavier, M. F., Ferreira, F. et al. (eds.). Proceedings of LREC-2004. vols. I-4. Lisboa, Portugal. ELRA

2005 "Issues of automatic phrase retrieval in web text", in pre-proceedings of Phraseology 2005: The many faces of Phraseology. An interdisciplinary conference, 13-15 October 2005, Louvain-la-Neuve, Belgium.

2005 with Kehoe, A and J. Banerjee. The WebCorp Search Engine: a holistic approach to Web text Search, in electronic Proceedings of CL2005, University of Birmingham

2005 'Phrasal creativity viewed from an IT perspective', in RANAM (Recherches Anglaises et Nord Américaines) no.38: Language chunks and linguistic units, ed. A. Hamm, Université Marc Bloch, Strasbourg. p. 113-122

2005 'Corpus Linguistics: past and present", in Wei Naixing, Wenzhong, Li, Pu Jianzhong (eds.), Corpora in Use: In honour of Professor Yang Huizhong

2006 'The Turing Test Applied to Automatic Linguistic Analysis.' In: Alwin Fill/Georg Marko/David Newby/Hermine Penz (eds.) (2006). Linguists (Do Not) Only Talk About It. Tübingen: Stauffenburg. 123-128.

2006 with Kehoe, A. & J. Banerjee WebCorp: an integrated system for web text search, in Hundt, M., N. Nesselhauf & C. Biewer (eds.), Corpus Linguistics and the Web, Amsterdam: Rodopi.

2007 'Corpus Development 25 years on: from super-corpus to cyber-corpus?', in Facchinetti, Roberta (ed.) Corpus Linguistics 25 Years on. Amsterdam & New York: Rodopi. In Series: Language and Computers - Studies in Practical Linguistics 62. 27-49

2007 Tracing lexical productivity and creativity in the British media: The Chavs and the Chav-nots?, in Judith Munat (ed.) Lexical Creativity, Texts and Contexts. Amsterdam: John Benjamins Publishing Company. 61-89

2007 with Banerjee, J. The search for repulsion: a new corpus analytical approach in Pahta, P., I. Taavitsainen, T. Nevalainen & J. Tyrkkö (eds.) Towards Multimedia in Corpus Studies, electronic publication, University of Helsinki.

2007 with Banerjee, J. "Lexical Repulsion between sense-related pairs" in M. Mahlberg (ed.), International Journal of Corpus Linguistics: 12.3. John Benjamins Publishing Company. pp 415-443. ISSN 1384-6655

2009 with Banerjee, J. "The Phenomenon of Repulsion in Text", in 'Special edition of Proceedings of 25th International Conference on Lexis and Grammar, Palermo, Sicily, Sept. 6-10, 2006', in Lingvisticare Investigationes, Leclère, C et al (eds.). Amsterdam: John Benjamins Publishing Company.

2009 "Corpus Linguistics beyond Google: the WebCorp Linguist's Search Engine" in 'New Paths for Computing Humanists', Siemens, R. and G. Shawver (eds.) in Digital Studies / Le champ numérique Vol 1, No 1 (ISSN 1918-3666) the Society for Digital Humanities / Société pour l'étude des médias interactifs (SDH/SEMI)

2009 with Kehoe, A. (eds.) Corpus Linguistics: Refinements and Reassessments, Amsterdam: Rodopi. (ISBN-13: 978-9042025974)

2010 with Banerjee, J. "Lexical repulsion and its Applications", in Actes du 27e Colloque international sur le lexique et la grammaire (L'Aquila, 10-13 septembre 2008 ) ed. M de Gioia, in series Lingue d'Europa e del Mediterraneo: Grammatica comparata, Aracne, Rome.

2010 "Identification automatique de la néologie lexicologique et sémantique : questions soulevées par notre méthode", in Cabré, M.T.; Domènech, O.; Estopà, R.; Freixa, J.; Lorente, M. (eds.). Actes del I Congrés Internacional de Neologia de les Llengües Romàniques. Barcelona: Institut Universitari de Lingüística Aplicada; Documenta Universitaria, 2010. 129-141.

2010 "A Case Study of the suffix /ette/ in English", in 'Les Tables: La grammaire du français par le menu', Nakamura, T; Laporte, E; Dister, A; and C. Fairon (eds), UCL Presses Universitaires de Louvain.

2012 "Defining neology to meet the needs of the translator", in Humbley, John, and Jean-François Sablayrolles (eds.) Neologica 6, Paris, Editions Classiques, Garnier.

2012 "The Nature of Neology in English: a corpus-based investigation", in Proceedings of ICAME 32, University of Oslo, Amsterdam: Rodopi.

2012 "A Finer Definition of Neology in English: the life-cycle of a word", in Hasselgård, Hilde, Signe Oksefjell Ebeling and Jarle Ebeling (eds.). Corpus Perspectives on Patterns of Lexis. Amsterdam/ Philadelphia: John Benjamins Publishing Company.

Back to top


Former Unit Staff

• Jayeeta Banerjee

2004 SHARES User Guide.

2005 with Renouf, A. & Kehoe, A. "The WebCorp Search Engine" at IGM 'Lexis and Grammar' UCE and IGM Marne-la-Vallee, Liverpool Sept. 14-18, 2005

2005 with Renouf, A. & Kehoe, A. The WebCorp Search Engine: A holistic approach to web text search, in Proceedings from the Corpus Linguistics Conference Series, Vol. 1, no. 1, ISSN 1747-9398

2006 with Renouf, A. & Kehoe, A. WebCorp: an integrated system for web text search, in Hundt, M., N. Nesselhauf & C. Biewer (eds.), Corpus Linguistics and the Web, Amsterdam: Rodopi.

2007 with Renouf, A. The search for repulsion: a new corpus analytical approach in Pahta, P., I. Taavitsainen, T. Nevalainen & J. Tyrkkö (eds.) Towards Multimedia in Corpus Studies, VARIENG eSeries, Vol. 2. University of Helsinki.

2007 with Renouf, A. Lexical Repulsion between sense-related pairs in M. Mahlberg (ed.), International Journal of Corpus Linguistics: 12.3. John Benjamins Publishing Company. pp 415-443. ISSN 1384-6655

(forthcoming in 2007) with A. Renouf. "The phenomenon of Repulsion in text" in special edition of Proceedings of 25th International Conference on Lexis and Grammar, Palermo, Sicily, Sept. 6-10, 2006, Lingvisticae Investigationes, Leclère, C. et al (eds.). Amsterdam: John Benjamins Publishing Company.

• Susan Blackwell

1993 'From dirty data to clean language', in English Corpora Design, Analysis and Exploitation: Papers from the 13th ICAME Conference, Nijmegen 1992, eds. Aarts, Jan, Pieter de Haan and Nelleke Oostdijk, Rodopi, Amsterdam, pp. 97-105.

• Alex Collier

1990 'The Birmingham Johnson Dictionary Project: A Corpus-based Approach to Historical Lexicography', in Proceedings from the Stockholm Conference on the Use of Computers in Language Research and Teaching, Dept 1989. Stockholm Papers in English Language and Literature 6, Dept. of English, University of Stockholm, pp.1-9.

1993 'Issues of large-scale collocational analysis', in English Language Corpora: Design, Analysis and Exploitation: Papers from the 13th ICAME Conference, Nijmegen 1992, eds. Aarts, Jan, Pieter de Haan and Nelleke Oostdijk, Rodopi, Amsterdam, pp. 289-298.

1994 'Software for the Johnson Dictionary Project', in Early Dictionary Databases eds. Lancashire, Ian, and T R Wooldridge, University of Toronto, pp. 197-202.

1995 with Renouf, Antoinette, 'A system of automatic textual abridgement', in Proceedings of AI'95, 15th International Conference, Language Engineering '95, Montpellier, June 27-30, 1995, pp. 395-407.

1997 with Pacey, Mike, 'A Large-scale Corpus System for Identifying Thesaural Relations', in Corpus-based Studies in English: Papers from the seventeenth International Conference on English Language Research on Computerized Corpora (ICAME 17), Stockholm, May 15-19, 1996, ed. Ljung, Magnus, Rodopi, Amsterdam, pp. 87-100.

1998 with Pacey, Mike, and Antoinette Renouf, 'Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora', in Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, 15-16 August 1998 ed. Charniak, Eugene, COLING-ACL, pp. 76-84.

1998 'Big can be beautiful: The automatic selection of concordance lines from large corpora'. In TALC98 Conference Proceedings, Keble College, Oxford, 24-27 July 1998 pp. 42-46.

1998 'Identifying diachronic change in semantic relations'. In Explorations in Corpus Linguistics, selected papers from ICAME 18, Chester, May 1997 pp. 259-268.

• David Mezquiriz

2004 with Renouf, A. and Kehoe, Andrew: "The Accidental Corpus: issues involved in extracting linguistic information from the Web", in Aijmer, Karin & Bengt Altenberg (eds.) Proceedings of 21st ICAME Conference, University of Gothenburg, May 22-26 2002, Amsterdam/Atlanta GA: Rodopi pp.

• Barry Morley

2003 with Morley B. & A. Renouf Linguistic Research with the XML/RDF aware WebCorp Tool in Proceedings of WWW2003, Budapest.

2006 'WebCorp: A Tool for Online Linguistic Information Retrieval and Analysis' in in Renouf, A. & Kehoe, A. (eds.) The Changing Face of Corpus Linguistics, Amsterdam/Atlanta GA: Rodopi

• Mike Pacey

1997 with Collier, Alex, 'A Large-scale Corpus System for Identifying Thesaural Relations', in Corpus-based Studies in English: Papers from the seventeenth International Conference on English Language Research on Computerized Corpora (ICAME 17), Stockholm, May 15-19, 1996, ed. Ljung, Magnus, Rodopi, Amsterdam, pp. 87-100.

1997 with Fligelstone, S.and P. Rayson, 'How to generalise the task of annotation', in Corpus Annotation: Linguistic Information from Computer Text Corpora, eds. Garside, R., Leech, G. and A. McEnery, Longman, London, pp.122-136.

1998 'The use of clustering techniques to reveal semantic relations between words', in Explorations in Corpus Linguistics, ed. Renouf, Antoinette, Rodopi, Amsterdam, pp. 269-280.

1998 with Collier, Alex, and Antoinette Renouf, 'Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora', in Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, 15-16 August 1998 ed. Charniak, Eugene, COLING-ACL, pp. 76-84.

Back to top


PhD Students

The following theses appear here with the permission of the University of Liverpool and the authors.

1996-9 Steve Jones: A Corpus-based study of Antonymy (book published by Routledge, 2002), supervised by A. Renouf and M.P. Hoey.

1998-2003 Debbie Danks: A Corpus-based Study of Blending in English Word Formation, supervised by A. Renouf and M.P. Hoey.

2001-4 Ceri Davies: A Corpus-Based Investigation Of Noun To Verb Conversion In English, supervised by A. Renouf.