Learning similarity-based word sense disambiguation from sparse data

Yael Karov; Shimon Edelman (The Weizmann Institute of Science)

arXiv:cmp-lg/9605009·cmp-lg·February 3, 2008·20 cites

Learning similarity-based word sense disambiguation from sparse data

Yael Karov, Shimon Edelman (The Weizmann Institute of Science)

PDF

Open Access

TL;DR

This paper introduces an iterative, similarity-based method for word sense disambiguation that learns from sparse data by leveraging context and word similarities, improving disambiguation accuracy.

Contribution

It presents a novel iterative approach that combines context and word similarity measures to disambiguate word senses using minimal training data.

Findings

01

Method performs well with sparse data

02

Learns typical usages for word senses

03

Effective in disambiguating polysemous words

Abstract

We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems