Universal Dependencies for Learner English
Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia, Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz

TL;DR
This paper introduces the first publicly available syntactic treebank for ESL, providing annotated dependency trees for learner English sentences, and evaluates how grammatical errors impact parsing performance.
Contribution
It presents the Treebank of Learner English (TLE), with detailed annotations and guidelines for ungrammatical language, enabling research on second language acquisition and NLP for learner English.
Findings
Benchmarking shows the impact of grammatical errors on parsing accuracy.
The TLE dataset supports linguistic and computational research.
Error annotations facilitate analysis of ungrammatical sentence structures.
Abstract
We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research on second language acquisition as well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
