Homonym Identification using BERT -- Using a Clustering Approach
Rohan Saha

TL;DR
This paper explores the use of BERT embeddings combined with clustering algorithms to identify homonyms in text, aiming to improve coarse-grained word sense disambiguation by leveraging contextual information.
Contribution
It introduces a clustering approach using BERT embeddings for homonym identification, demonstrating the potential of contextual embeddings over traditional methods.
Findings
BERT embeddings effectively capture context for homonym detection
Clustering algorithms can distinguish different senses in embedding space
Visualization supports the feasibility of the clustering approach
Abstract
Homonym identification is important for WSD that require coarse-grained partitions of senses. The goal of this project is to determine whether contextual information is sufficient for identifying a homonymous word. To capture the context, BERT embeddings are used as opposed to Word2Vec, which conflates senses into one vector. SemCor is leveraged to retrieve the embeddings. Various clustering algorithms are applied to the embeddings. Finally, the embeddings are visualized in a lower-dimensional space to understand the feasibility of the clustering process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Weight Decay · Linear Warmup With Linear Decay · Softmax · Dropout · Dense Connections · Attention Is All You Need · Multi-Head Attention · WordPiece · Attention Dropout
