Transcribing Medieval Manuscripts for Machine Learning
Estelle Gu\'eville, David Joseph Wrisley

TL;DR
This paper explores the automation of medieval manuscript transcription using handwritten text recognition (HTR), emphasizing the importance of tailored transcription schemes for scholarly research and demonstrating new analytical possibilities at scale.
Contribution
It introduces guidelines for customizing transcription schemes for medieval manuscripts, integrating HTR technology to facilitate large-scale textual analysis and scholarly inquiry.
Findings
HTR enables scalable transcription of medieval manuscripts.
Customized transcription schemes improve analysis of scribal variation.
HTR transcriptions support new research questions in medieval studies.
Abstract
This article focuses on the transcription of medieval manuscripts. Whereas problems of transcription have long interested medievalists, few workable options in the era of printed editions were available besides normalisation. The automation of this process, known as handwritten text recognition (HTR), has made new kinds of digital text creation possible, but also has foregrounded the necessity of theorising transcription in our scholarly practices. We reflect here on different notions of transcription against the backdrop of changing text technologies. Moreover, drawing on our own research on medieval Latin Bibles, we present general guidelines for customizing transcription schemes, arguing that they must be designed with specific research questions and scholarly end use in mind. Since we are particularly interested in the scribal contribution to the production of codices, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Natural Language Processing Techniques
