Automatic Ambiguity Detection

Richard Sproat; Jan van Santen

arXiv:1905.12065·cs.CL·May 30, 2019

Automatic Ambiguity Detection

Richard Sproat, Jan van Santen

PDF

Open Access

TL;DR

This paper introduces an algorithm that automatically detects ambiguous words and measures their polysemy degree from unlabeled text corpora, addressing limitations of existing sense disambiguation methods.

Contribution

It presents a novel unsupervised approach for identifying polysemous terms and quantifying their ambiguity without relying on predefined sense lists.

Findings

01

Successfully identifies polysemous terms in unlabeled corpora

02

Provides a quantitative polysemy index for each term

03

Addresses partial coverage issues in sense disambiguation

Abstract

Most work on sense disambiguation presumes that one knows beforehand -- e.g. from a thesaurus -- a set of polysemous terms. But published lists invariably give only partial coverage. For example, the English word tan has several obvious senses, but one may overlook the abbreviation for tangent. In this paper, we present an algorithm for identifying interesting polysemous terms and measuring their degree of polysemy, given an unlabeled corpus. The algorithm involves: (i) collecting all terms within a k-term window of the target term; (ii) computing the inter-term distances of the contextual terms, and reducing the multi-dimensional distance space to two dimensions using standard methods; (iii) converting the two-dimensional representation into radial coordinates and using isotonic/antitonic regression to compute the degree to which the distribution deviates from a single-peak model. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies