Unsupervised Abbreviation Disambiguation Contextual disambiguation using word embeddings
Manuel Ciosici, Tobias Sommer, Ira Assent

TL;DR
This paper introduces an unsupervised method for abbreviation disambiguation using word embeddings, which learns context representations from unstructured text and outperforms existing approaches on large real-world datasets.
Contribution
The paper presents the first transparent, unsupervised abbreviation disambiguation method that does not require labeled data and scales to thousands of abbreviations with multiple meanings.
Findings
UAD achieves high accuracy on diverse real-world datasets.
UAD outperforms baseline and state-of-the-art methods.
The approach scales efficiently to large vocabularies.
Abstract
Abbreviations often have several distinct meanings, often making their use in text ambiguous. Expanding them to their intended meaning in context is important for Machine Reading tasks such as document search, recommendation and question answering. Existing approaches mostly rely on manually labeled examples of abbreviations and their correct long-forms. Such data sets are costly to create and result in trained models with limited applicability and flexibility. Importantly, most current methods must be subjected to a full empirical evaluation in order to understand their limitations, which is cumbersome in practice. In this paper, we present an entirely unsupervised abbreviation disambiguation method (called UAD) that picks up abbreviation definitions from unstructured text. Creating distinct tokens per meaning, we learn context representations as word vectors. We demonstrate how to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
