Aspects of Pattern-Matching in Data-Oriented Parsing
Guy De Pauw

TL;DR
This paper reinterprets Data-Oriented Parsing as a pattern-matching model focused on maximizing substructure size, which simplifies computation and maintains accuracy by enhancing context sensitivity.
Contribution
It introduces a pattern-matching perspective to DOP, eliminating the need for multiple derivations and enabling more efficient Viterbi-style parsing algorithms.
Findings
Pattern-matching approach retains parsing accuracy
Eliminates double work in probabilistic derivations
Enables efficient Viterbi-style optimization
Abstract
Data-Oriented Parsing (dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the dop-model, however, seems to involve a lot of CPU cycles and a considerable amount of double work, brought on by the concept of multiple derivations, which is necessary for probabilistic processing, but which is not convincingly related to a proper linguistic backbone. It is however possible to re-interpret the dop-model as a pattern-matching model, which tries to maximize the size of the substructures that construct the parse, rather than the probability of the parse. By emphasizing this memory-based aspect of the dop-model, it is possible to do away with multiple derivations, opening up possibilities for efficient Viterbi-style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
