A Critical Analysis of Biased Parsers in Unsupervised Parsing

Chris Dyer; G\'abor Melis; Phil Blunsom

arXiv:1909.09428·cs.CL·September 23, 2019·22 cites

A Critical Analysis of Biased Parsers in Unsupervised Parsing

Chris Dyer, G\'abor Melis, Phil Blunsom

PDF

Open Access 1 Repo

TL;DR

This paper critically examines a popular unsupervised parsing algorithm, revealing its limitations and biases, particularly its tendency to favor right-branching structures, which can lead to overestimating language models' syntactic understanding.

Contribution

The paper provides a detailed analysis of the Shen et al. (2018) parsing algorithm, highlighting its incompleteness and bias towards right-branching trees, and discusses implications for evaluating language models.

Findings

01

Proxies from standard LSTM models perform similarly to specialized architectures.

02

The parsing algorithm is incomplete, recovering only a subset of possible trees.

03

It exhibits a bias towards right-branching structures, inflating performance on languages like English.

Abstract

A series of recent papers has used a parsing algorithm due to Shen et al. (2018) to recover phrase-structure trees based on proxies for "syntactic depth." These proxy depths are obtained from the representations learned by recurrent language models augmented with mechanisms that encourage the (unsupervised) discovery of hierarchical structure latent in natural language sentences. Using the same parser, we show that proxies derived from a conventional LSTM language model produce trees comparably well to the specialized architectures used in previous work. However, we also provide a detailed analysis of the parsing algorithm, showing (1) that it is incomplete---that is, it can recover only a fraction of possible trees---and (2) that it has a marked bias for right-branching structures which results in inflated performance in right-branching languages like English. Our analysis shows that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

galsang/trees_from_transformers
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory