Prepositional Phrase Attachment through a Backed-Off Model

Michael Collins; James Brooks (University of Pennsylvania)

arXiv:cmp-lg/9506021·cmp-lg·February 3, 2008·52 cites

Prepositional Phrase Attachment through a Backed-Off Model

Michael Collins, James Brooks (University of Pennsylvania)

PDF

Open Access

TL;DR

This paper applies backed-off n-gram language modeling techniques to resolve prepositional phrase attachment ambiguity, achieving notable accuracy improvements and highlighting the significance of low-frequency events.

Contribution

It introduces a novel application of backed-off language models to prepositional phrase attachment, demonstrating effectiveness on Wall Street Journal data.

Findings

01

Achieved 84.5% accuracy on WSJ data.

02

Ignoring low-count events reduces accuracy to 81.6%.

03

Backed-off models are effective for syntactic disambiguation.

Abstract

Recent work has considered corpus-based or statistical approaches to the problem of prepositional phrase attachment ambiguity. Typically, ambiguous verb phrases of the form {v np1 p np2} are resolved through a model which considers values of the four head words (v, n1, p and n2). This paper shows that the problem is analogous to n-gram language models in speech recognition, and that one of the most common methods for language modeling, the backed-off estimate, is applicable. Results on Wall Street Journal data of 84.5% accuracy are obtained using this method. A surprising result is the importance of low-count events - ignoring events which occur less than 5 times in training data reduces performance to 81.6%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis