Exploring automatic word sense disambiguation with decision lists and the Web
Eneko Agirre, David Martinez

TL;DR
This paper evaluates decision lists for word sense disambiguation across multiple corpora, including one from the Web, revealing their effectiveness and limitations in handling polysemous words.
Contribution
It provides an in-depth analysis of decision lists' performance on various corpora, highlighting the potential and challenges of automatic Web-acquired data.
Findings
Decision lists achieve about 0.7 precision on polysemous words.
SemCor can serve as a baseline for all-words disambiguation.
Web-acquired corpora may not be reliable for training.
Abstract
The most effective paradigm for word sense disambiguation, supervised learning, seems to be stuck because of the knowledge acquisition bottleneck. In this paper we take an in-depth study of the performance of decision lists on two publicly available corpora and an additional corpus automatically acquired from the Web, using the fine-grained highly polysemous senses in WordNet. Decision lists are shown a versatile state-of-the-art technique. The experiments reveal, among other facts, that SemCor can be an acceptable (0.7 precision for polysemous words) starting point for an all-words system. The results on the DSO corpus show that for some highly polysemous words 0.7 precision seems to be the current state-of-the-art limit. On the other hand, independently constructed hand-tagged corpora are not mutually useful, and a corpus automatically acquired from the Web is shown to fail.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
