Improved Latent Tree Induction with Distant Supervision via Span Constraints
Zhiyang Xu, Andrew Drozdov, Jay Yoon Lee, Tim O'Gorman, Subendhu, Rongali, Dylan Finkbeiner, Shilpa Suresh, Mohit Iyyer, Andrew McCallum

TL;DR
This paper introduces a method that uses minimal span constraints from distant supervision to significantly improve unsupervised constituency parsing performance, making it more practical for real-world applications.
Contribution
The authors propose a novel approach leveraging span constraints from distant supervision to enhance latent tree induction in unsupervised parsing systems.
Findings
Span constraints improve WSJ parsing by over 5 F1 points.
Method extends effectively to biomedical text parsing.
Minimal span constraints can be derived from simple sources like Wikipedia.
Abstract
For over thirty years, researchers have developed and analyzed methods for latent tree induction as an approach for unsupervised syntactic parsing. Nonetheless, modern systems still do not perform well enough compared to their supervised counterparts to have any practical use as structural annotation of text. In this work, we present a technique that uses distant supervision in the form of span constraints (i.e. phrase bracketing) to improve performance in unsupervised constituency parsing. Using a relatively small number of span constraints we can substantially improve the output from DIORA, an already competitive unsupervised parsing system. Compared with full parse tree annotation, span constraints can be acquired with minimal effort, such as with a lexicon derived from Wikipedia, to find exact text matches. Our experiments show span constraints based on entities improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
