Learning similarity-based word sense disambiguation from sparse data
Yael Karov, Shimon Edelman (The Weizmann Institute of Science)

TL;DR
This paper introduces an iterative, similarity-based method for word sense disambiguation that learns from sparse data by leveraging context and word similarities, improving disambiguation accuracy.
Contribution
It presents a novel iterative approach that combines context and word similarity measures to disambiguate word senses using minimal training data.
Findings
Method performs well with sparse data
Learns typical usages for word senses
Effective in disambiguating polysemous words
Abstract
We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
