Distinguishing Word Senses in Untagged Text
Ted Pedersen (Southern Methodist University), Rebecca Bruce, (Southern Methodist University)

TL;DR
This paper compares three unsupervised algorithms for word sense disambiguation in untagged text, finding that McQuitty's similarity analysis with a high-dimensional feature set performs best, especially for nouns.
Contribution
It introduces an experimental comparison of three unsupervised methods for word sense disambiguation using automatically derived features.
Findings
McQuitty's similarity analysis is most accurate among the tested methods.
Disambiguation works better for nouns than for adjectives or verbs.
High-dimensional feature sets improve disambiguation accuracy.
Abstract
This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Mining Algorithms and Applications
