Topological Data Analysis for Word Sense Disambiguation
Michael Rawson, Samuel Dooley, Mithun Bharadwaj, and Rishabh Choudhary

TL;DR
This paper introduces a novel unsupervised method for word sense disambiguation using topological data analysis, specifically persistent homology, demonstrating promising results on the SemCor dataset.
Contribution
It applies advanced topological concepts to NLP, offering a new perspective beyond traditional clustering methods for word sense induction.
Findings
Low relative error on word sense induction tasks
Demonstrates the effectiveness of topological algorithms in NLP
Encourages further research in topological data analysis for language processing
Abstract
We develop and test a novel unsupervised algorithm for word sense induction and disambiguation which uses topological data analysis. Typical approaches to the problem involve clustering, based on simple low level features of distance in word embeddings. Our approach relies on advanced mathematical concepts in the field of topology which provides a richer conceptualization of clusters for the word sense induction tasks. We use a persistent homology barcode algorithm on the SemCor dataset and demonstrate that our approach gives low relative error on word sense induction. This shows the promise of topological algorithms for natural language processing and we advocate for future work in this promising area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Homotopy and Cohomology in Algebraic Topology
