“a digital equivalent of the marginalia associated with the close reading of literary texts”

CASE XML Conversion Tool

We are currently working on an exciting project to record English spoken by students in academic institutions around the world. The Corpus of Academic Spoken English (CASE) is being compiled by a team of researchers at Saarland University. Birmingham City University is one partner providing students for the project, and we are also developing software to support the analysis of the transcribed spoken data.

Our CASE XML Conversion tool converts the project's default mark-up (based on discourse analysis notation) to a bespoke XML schema, encapsulating all of the original information in a machine readable form. The XML versions of the transcripts will enable additional levels of computational analysis. For example, XPath searches enable features of the texts to be found with relative ease, and frequency information can be extracted about these features. Our sister software, ConcXML, is designed to do just that, as well as producing concordance, word frequency and ngram output, as may be expected from any Corpus Linguistic analysis tool. We hope that the machine readable XML will assist in performing further analyses as the project progresses.

Find out more about the CASE Project on their Google Plus page.

Try out converting your transcripts into XML. Note that your mark-up will probably be different to that which the tool understands. If you are interested in learning more please contact Matt Gee.