This is an old revision of the document!
Table of Contents
OBC: The Old Bailey Corpus 2.0
The Old Bailey Corpus is a sociolinguistically, pragmatically and textually annotated corpus based on a selection of the Proceedings of Old Bailey. It consists of 637 texts recording trial proceedings which took place between 1720 and 1913 at Old Bailey, London. There are more than 24 million words in the corpus - its overall size is over 35 million tokens (including words, punctuation, etc.). More detailed information about the corpus is available here, as well as in the official OBC Manual.
The corpus is licensed under CC BY-NC-SA 4.0.
The digitalization process
The original pages of the Proceedings were scanned and the scans are now available at Old Bailey Online; you can access individual scans by clicking on the “see original” link on the right of the text of any trial (e.g. here). The texts were then manually transcribed by multiple typists and an optical character recognition (OCR) software was employed to create transcriptions for comparison so any differences or inaccuracies could be resolved. However, as the original pages are often faded or otherwise damaged (see, for example, here), it is not always possible to guarantee a 100% accuracy of the transcriptions. Users are therefore advised to consult the scanned pages when a very precise reading is required. More on the digitalization process here.
The texts were marked-up in XML (Extensible Markup Language) according to the TEI (Text Encoding Initiative) guidelines. Every single doc structure represents one proceeding and consists of multiple text structures, the first of which is usually the front matter (or else according to the type attribute) and the following contain the trial account itself.
Wiki course
For a basic overview of how to use the OBC corpus and how to input the data into the search interface check our wiki-course in eight lessons:
TOHLE BUDE SAMOZŘEJMĚ POTŘEBA UPRAVIT A PŘESMĚROVAT NA OBC.
How to cite
OBC: The Old Bailey Corpus 2.0. Ústav Českého národního korpusu FF UK, Prague 2020. Available from WWW: http://www.korpus.cz
The original Old Bailey Corpus: Huber, M. - Nissel, M. - Puga, K. (2016): Old Bailey Corpus 2.0. hdl:11858/00-246C-0000-0023-8CFB-2
The Old Bailey Proceedings Online: Hitchcock, T. - Shoemaker, R. - Emsley, C. - Howard, S. - McLaughlin, J. et al. (2012): The Old Bailey Proceedings Online, 1674-1913. www.oldbaileyonline.org, version 7.0, 24 March 2012.