A Critical Analysis of Biased Parsers in Unsupervised Parsing
Chris Dyer, G\'abor Melis, Phil Blunsom

TL;DR
This paper critically examines a popular unsupervised parsing algorithm, revealing its limitations and biases, particularly its tendency to favor right-branching structures, which can lead to overestimating language models' syntactic understanding.
Contribution
The paper provides a detailed analysis of the Shen et al. (2018) parsing algorithm, highlighting its incompleteness and bias towards right-branching trees, and discusses implications for evaluating language models.
Findings
Proxies from standard LSTM models perform similarly to specialized architectures.
The parsing algorithm is incomplete, recovering only a subset of possible trees.
It exhibits a bias towards right-branching structures, inflating performance on languages like English.
Abstract
A series of recent papers has used a parsing algorithm due to Shen et al. (2018) to recover phrase-structure trees based on proxies for "syntactic depth." These proxy depths are obtained from the representations learned by recurrent language models augmented with mechanisms that encourage the (unsupervised) discovery of hierarchical structure latent in natural language sentences. Using the same parser, we show that proxies derived from a conventional LSTM language model produce trees comparably well to the specialized architectures used in previous work. However, we also provide a detailed analysis of the parsing algorithm, showing (1) that it is incomplete---that is, it can recover only a fraction of possible trees---and (2) that it has a marked bias for right-branching structures which results in inflated performance in right-branching languages like English. Our analysis shows that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
