Parsing Early Modern English for Linguistic Search
Seth Kulick, Neville Ryant

TL;DR
This paper explores how recent NLP advancements, including word embeddings and parsing, can significantly expand the scope of research in historical English syntax by automatically annotating large corpora.
Contribution
It demonstrates the application of modern NLP tools like ELMo embeddings for POS tagging and parsing in early modern English, enabling large-scale linguistic analysis.
Findings
Improved accuracy of POS tagging and parsing on historical English.
Enhanced search capabilities over large annotated corpora.
Potential for large-scale linguistic research in historical syntax.
Abstract
We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax. This brings together many of the usual tools in NLP - word embeddings, tagging, and parsing - in the service of linguistic queries over automatically annotated corpora. We train a part-of-speech (POS) tagger and parser on a corpus of historical English, using ELMo embeddings trained over a billion words of similar text. The evaluation is based on the standard metrics, as well as on the accuracy of the query searches using the parsed data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo
