Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing Results and Analysis
Seth Kulick, Neville Ryant, Beatrice Santorini

TL;DR
This paper reports the initial parsing results on a large corpus of Early Modern English, demonstrating the feasibility of recovering syntactic structures using a modified neural parser, which aids linguistic research on language change.
Contribution
It introduces the first parsing results on the PPCEME corpus and adapts a neural parser to handle its unique features, advancing computational historical linguistics.
Findings
Modified Berkeley Neural Parser achieves promising accuracy.
Function tag recovery approach is effective for most tags.
Additional work needed for certain function tags like direct speech.
Abstract
We present the first parsing results on the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), a 1.9 million word treebank that is an important resource for research in syntactic change. We describe key features of PPCEME that make it challenging for parsing, including a larger and more varied set of function tags than in the Penn Treebank. We present results for this corpus using a modified version of the Berkeley Neural Parser and the approach to function tag recovery of Gabbard et al (2006). Despite its simplicity, this approach works surprisingly well, suggesting it is possible to recover the original structure with sufficient accuracy to support linguistic applications (e.g., searching for syntactic structures of interest). However, for a subset of function tags (e.g., the tag indicating direct speech), additional work is needed, and we discuss some further limits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistic Variation and Morphology · Second Language Acquisition and Learning
