On the Role of Supervision in Unsupervised Constituency Parsing
Haoyue Shi, Karen Livescu, Kevin Gimpel

TL;DR
This paper demonstrates that few-shot supervised training with minimal labeled data can outperform unsupervised constituency parsing models, emphasizing the importance of supervision and data efficiency in parsing research.
Contribution
It introduces strong supervised baselines for unsupervised parsing models using limited labeled data and proposes protocols for fair evaluation emphasizing minimal supervision.
Findings
Few-shot supervised parsing outperforms unsupervised methods.
Data augmentation and self-training improve few-shot parsing.
Minimal labeled data can achieve competitive parsing performance.
Abstract
We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing score on the Wall Street Journal (WSJ) development set (1,700 sentences). We introduce strong baselines for them, by training an existing supervised parsing model (Kitaev and Klein, 2018) on the same labeled examples they access. When training on the 1,700 examples, or even when using only 50 examples for training and 5 for development, such a few-shot parsing approach can outperform all the unsupervised parsing methods by a significant margin. Few-shot parsing can be further improved by a simple data augmentation method and self-training. This suggests that, in order to arrive at fair conclusions, we should carefully consider the amount of labeled data used for model development. We propose two protocols for future work on unsupervised parsing: (i) use fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
