Efficient Algorithms for Parsing the DOP Model
Joshua Goodman (Harvard University)

TL;DR
This paper introduces efficient, deterministic algorithms for Data-Oriented Parsing (DOP) that significantly reduce computational complexity and improve parsing accuracy, challenging previous results and providing a more practical approach.
Contribution
It presents a novel reduction of the DOP model to a small probabilistic context-free grammar and a deterministic parsing strategy that maximizes correct constituents.
Findings
Achieved 97% crossing brackets rate
Achieved 88% zero crossing brackets rate
Results are comparable to previous studies but with more efficient algorithms
Abstract
Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model to a small, equivalent probabilistic context-free grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct constituents, rather than the probability of a correct parse tree. Using the optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is comparable to results from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
