Efficient Algorithms for Parsing the DOP Model

Joshua Goodman (Harvard University)

arXiv:cmp-lg/9604008·cmp-lg·February 3, 2008·57 cites

Efficient Algorithms for Parsing the DOP Model

Joshua Goodman (Harvard University)

PDF

Open Access

TL;DR

This paper introduces efficient, deterministic algorithms for Data-Oriented Parsing (DOP) that significantly reduce computational complexity and improve parsing accuracy, challenging previous results and providing a more practical approach.

Contribution

It presents a novel reduction of the DOP model to a small probabilistic context-free grammar and a deterministic parsing strategy that maximizes correct constituents.

Findings

01

Achieved 97% crossing brackets rate

02

Achieved 88% zero crossing brackets rate

03

Results are comparable to previous studies but with more efficient algorithms

Abstract

Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model to a small, equivalent probabilistic context-free grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct constituents, rather than the probability of a correct parse tree. Using the optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is comparable to results from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems