TL;DR
This paper introduces a new graph-based representation and an $O(n^4)$ algorithm for parsing complex linguistic structures, enabling efficient analysis of nearly all treebank structures including traces and null elements.
Contribution
It proposes a novel, reversible graph representation and a dynamic programming algorithm that efficiently covers most treebank structures, including those with long-distance dependencies.
Findings
Covers 97.3% of the Penn English Treebank structures
Provides an $O(n^4)$ parsing algorithm for directed, acyclic, one-endpoint-crossing graphs
Demonstrates a proof-of-concept parser recovering null elements and traces
Abstract
General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
