Towards Resolving Word Ambiguity with Word Embeddings

Matthias Thurnbauer; Johannes Reisinger; Christoph Goller; Andreas; Fischer

arXiv:2307.13417·cs.CL·July 26, 2023·1 cites

Towards Resolving Word Ambiguity with Word Embeddings

Matthias Thurnbauer, Johannes Reisinger, Christoph Goller, Andreas, Fischer

PDF

Open Access

TL;DR

This paper introduces a method using DBSCAN clustering on word embedding spaces to identify and evaluate ambiguous words, providing a resource-efficient alternative to complex models for resolving word ambiguity.

Contribution

It proposes a novel approach to detect ambiguous words and assess their ambiguity levels using clustering, reducing reliance on costly transformer models.

Findings

01

DBSCAN effectively identifies ambiguous words in embedding space

02

Automatic parameter selection yields semantically coherent clusters

03

Method offers a resource-efficient alternative for ambiguity detection

Abstract

Ambiguity is ubiquitous in natural language. Resolving ambiguous meanings is especially important in information retrieval tasks. While word embeddings carry semantic information, they fail to handle ambiguity well. Transformer models have been shown to handle word ambiguity for complex queries, but they cannot be used to identify ambiguous words, e.g. for a 1-word query. Furthermore, training these models is costly in terms of time, hardware resources, and training data, prohibiting their use in specialized environments with sensitive data. Word embeddings can be trained using moderate hardware resources. This paper shows that applying DBSCAN clustering to the latent space can identify ambiguous words and evaluate their level of ambiguity. An automatic DBSCAN parameter selection leads to high-quality clusters, which are semantically coherent and correspond well to the perceived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Absolute Position Encodings · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer