Co-training an Unsupervised Constituency Parser with Weak Supervision
Nickil Maveli, Shay B. Cohen

TL;DR
This paper presents a novel unsupervised parsing method that uses co-training of inside and outside classifiers with weak supervision, achieving state-of-the-art results across multiple languages.
Contribution
It introduces a co-training approach with weak supervision and seed bootstrapping for unsupervised constituency parsing, improving accuracy and generalization.
Findings
Achieved 63.1 F1 on English PTB test set.
Set new state-of-the-art results on Chinese and Japanese treebanks.
Demonstrated effectiveness of weak supervision with prior linguistic knowledge.
Abstract
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F on the English (PTB) test set. In addition, we show the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsTest
