CGELBank: CGEL as a Framework for English Syntax Annotation
Brett Reynolds, Aryaman Arora, Nathan Schneider

TL;DR
This paper presents CGELBank, a new English syntax treebank based on the Cambridge Grammar of the English Language, offering a balanced formalism for linguistic analysis and corpus annotation.
Contribution
It introduces the CGEL formalism to treebanking, compares it with existing frameworks, and discusses its advantages for linguistic annotation and future automation.
Findings
CGEL provides a good tradeoff between analysis detail and usability.
Quantitative and qualitative comparisons with UD and PTB highlight CGEL's strengths.
The project sets the stage for automatic conversion and expanded corpus annotation.
Abstract
We introduce the syntactic formalism of the \textit{Cambridge Grammar of the English Language} (CGEL) to the world of treebanking through the CGELBank project. We discuss some issues in linguistic analysis that arose in adapting the formalism to corpus annotation, followed by quantitative and qualitative comparisons with parallel UD and PTB treebanks. We argue that CGEL provides a good tradeoff between comprehensiveness of analysis and usability for annotation, which motivates expanding the treebank with automatic conversion in the future.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
