Automatic transcription of 17th century English text in Contemporary English with NooJ: Method and Evaluation
Odile Piton (SAMM), Slim Mesfar (RIADI), H\'el\`ene Pignot (SAMM)

TL;DR
This paper presents a method using NooJ to automatically transcribe 17th century English texts into contemporary English, highlighting linguistic differences and developing tools for lexical and syntactical transcription.
Contribution
It introduces a transcription approach combining lexical and syntactical graphs within NooJ to automate converting archaic forms to modern English.
Findings
Created lexical variant dictionaries and transcription rules
Developed syntactical graphs for archaic form transcription
Proposed a framework for automated transcription of historical texts
Abstract
Since 2006 we have undertaken to describe the differences between 17th century English and contemporary English thanks to NLP software. Studying a corpus spanning the whole century (tales of English travellers in the Ottoman Empire in the 17th century, Mary Astell's essay A Serious Proposal to the Ladies and other literary texts) has enabled us to highlight various lexical, morphological or grammatical singularities. Thanks to the NooJ linguistic platform, we created dictionaries indexing the lexical variants and their transcription in CE. The latter is often the result of the validation of forms recognized dynamically by morphological graphs. We also built syntactical graphs aimed at transcribing certain archaic forms in contemporary English. Our previous research implied a succession of elementary steps alternating textual analysis and result validation. We managed to provide examples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Topic Modeling
