Unsupervised Parsing via Constituency Tests

Steven Cao; Nikita Kitaev; Dan Klein

arXiv:2010.03146·cs.CL·October 8, 2020

Unsupervised Parsing via Constituency Tests

Steven Cao, Nikita Kitaev, Dan Klein

PDF

Open Access

TL;DR

This paper introduces an unsupervised parsing method based on constituency tests and neural acceptability models, achieving state-of-the-art accuracy through iterative refinement.

Contribution

It presents a novel unsupervised parsing approach using constituency tests and grammaticality judgments, with a refinement process that significantly improves performance.

Findings

01

Achieves 62.8 F1 on Penn Treebank

02

Outperforms previous methods by 7.6 F1 points

03

Demonstrates effectiveness of constituency test-based parsing

Abstract

We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. One type of constituency test involves modifying the sentence via some transformation (e.g. replacing the span with a pronoun) and then judging the result (e.g. checking if it is grammatical). Motivated by this idea, we design an unsupervised parser by specifying a set of transformations and using an unsupervised neural acceptability model to make grammaticality decisions. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. While this approach already achieves performance in the range of current methods, we further improve accuracy by fine-tuning the grammaticality model through a refinement procedure, where we alternate between improving the estimated trees and improving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification