Distinguishing Word Senses in Untagged Text

Ted Pedersen (Southern Methodist University); Rebecca Bruce; (Southern Methodist University)

arXiv:cmp-lg/9706008·cmp-lg·February 3, 2008·129 cites

Distinguishing Word Senses in Untagged Text

Ted Pedersen (Southern Methodist University), Rebecca Bruce, (Southern Methodist University)

PDF

Open Access

TL;DR

This paper compares three unsupervised algorithms for word sense disambiguation in untagged text, finding that McQuitty's similarity analysis with a high-dimensional feature set performs best, especially for nouns.

Contribution

It introduces an experimental comparison of three unsupervised methods for word sense disambiguation using automatically derived features.

Findings

01

McQuitty's similarity analysis is most accurate among the tested methods.

02

Disambiguation works better for nouns than for adjectives or verbs.

03

High-dimensional feature sets improve disambiguation accuracy.

Abstract

This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Data Mining Algorithms and Applications