Towards Resolving Word Ambiguity with Word Embeddings
Matthias Thurnbauer, Johannes Reisinger, Christoph Goller, Andreas, Fischer

TL;DR
This paper introduces a method using DBSCAN clustering on word embedding spaces to identify and evaluate ambiguous words, providing a resource-efficient alternative to complex models for resolving word ambiguity.
Contribution
It proposes a novel approach to detect ambiguous words and assess their ambiguity levels using clustering, reducing reliance on costly transformer models.
Findings
DBSCAN effectively identifies ambiguous words in embedding space
Automatic parameter selection yields semantically coherent clusters
Method offers a resource-efficient alternative for ambiguity detection
Abstract
Ambiguity is ubiquitous in natural language. Resolving ambiguous meanings is especially important in information retrieval tasks. While word embeddings carry semantic information, they fail to handle ambiguity well. Transformer models have been shown to handle word ambiguity for complex queries, but they cannot be used to identify ambiguous words, e.g. for a 1-word query. Furthermore, training these models is costly in terms of time, hardware resources, and training data, prohibiting their use in specialized environments with sensitive data. Word embeddings can be trained using moderate hardware resources. This paper shows that applying DBSCAN clustering to the latent space can identify ambiguous words and evaluate their level of ambiguity. An automatic DBSCAN parameter selection leads to high-quality clusters, which are semantically coherent and correspond well to the perceived…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Absolute Position Encodings · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer
