Parsing Early Modern English for Linguistic Search

Seth Kulick; Neville Ryant

arXiv:2002.10546·cs.CL·February 26, 2020·1 cites

Parsing Early Modern English for Linguistic Search

Seth Kulick, Neville Ryant

PDF

Open Access

TL;DR

This paper explores how recent NLP advancements, including word embeddings and parsing, can significantly expand the scope of research in historical English syntax by automatically annotating large corpora.

Contribution

It demonstrates the application of modern NLP tools like ELMo embeddings for POS tagging and parsing in early modern English, enabling large-scale linguistic analysis.

Findings

01

Improved accuracy of POS tagging and parsing on historical English.

02

Enhanced search capabilities over large annotated corpora.

03

Potential for large-scale linguistic research in historical syntax.

Abstract

We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax. This brings together many of the usual tools in NLP - word embeddings, tagging, and parsing - in the service of linguistic queries over automatically annotated corpora. We train a part-of-speech (POS) tagger and parser on a corpus of historical English, using ELMo embeddings trained over a billion words of similar text. The evaluation is based on the standard metrics, as well as on the accuracy of the query searches using the parsed data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo