A Probabilistic Model of Compound Nouns
Mark Lauer (Microsoft Institute, Sydney), Mark Dras (Microsoft, Institute, Sydney)

TL;DR
This paper introduces a probabilistic model for analyzing compound nouns in natural language, addressing challenges like data sparseness and sense ambiguity, and achieving a 77% parsing accuracy on test data.
Contribution
It presents a novel corpus-based probabilistic approach that incorporates semantic classes and sense disambiguation for compound noun parsing.
Findings
Achieves 77% accuracy in parsing test compounds.
Addresses data sparseness with semantic word classes.
Handles sense ambiguity explicitly in the model.
Abstract
Compound nouns such as example noun compound are becoming more common in natural language and pose a number of difficult problems for NLP systems, notably increasing the complexity of parsing. In this paper we develop a probabilistic model for syntactically analysing such compounds. The model predicts compound noun structures based on knowledge of affinities between nouns, which can be acquired from a corpus. Problems inherent in this corpus-based approach are addressed: data sparseness is overcome by the use of semantically motivated word classes and sense ambiguity is explicitly handled in the model. An implementation based on this model is described in Lauer (1994) and correctly parses 77% of the test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
