A Probabilistic Model of Compound Nouns

Mark Lauer (Microsoft Institute; Sydney); Mark Dras (Microsoft; Institute; Sydney)

arXiv:cmp-lg/9409003·cmp-lg·February 3, 2008·6 cites

A Probabilistic Model of Compound Nouns

Mark Lauer (Microsoft Institute, Sydney), Mark Dras (Microsoft, Institute, Sydney)

PDF

Open Access

TL;DR

This paper introduces a probabilistic model for analyzing compound nouns in natural language, addressing challenges like data sparseness and sense ambiguity, and achieving a 77% parsing accuracy on test data.

Contribution

It presents a novel corpus-based probabilistic approach that incorporates semantic classes and sense disambiguation for compound noun parsing.

Findings

01

Achieves 77% accuracy in parsing test compounds.

02

Addresses data sparseness with semantic word classes.

03

Handles sense ambiguity explicitly in the model.

Abstract

Compound nouns such as example noun compound are becoming more common in natural language and pose a number of difficult problems for NLP systems, notably increasing the complexity of parsing. In this paper we develop a probabilistic model for syntactically analysing such compounds. The model predicts compound noun structures based on knowledge of affinities between nouns, which can be acquired from a corpus. Problems inherent in this corpus-based approach are addressed: data sparseness is overcome by the use of semantically motivated word classes and sense ambiguity is explicitly handled in the model. An implementation based on this model is described in Lauer (1994) and correctly parses 77% of the test set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques