Topological Data Analysis for Word Sense Disambiguation

Michael Rawson; Samuel Dooley; Mithun Bharadwaj; and Rishabh Choudhary

arXiv:2203.00565·cs.CL·March 2, 2022·1 cites

Topological Data Analysis for Word Sense Disambiguation

Michael Rawson, Samuel Dooley, Mithun Bharadwaj, and Rishabh Choudhary

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised method for word sense disambiguation using topological data analysis, specifically persistent homology, demonstrating promising results on the SemCor dataset.

Contribution

It applies advanced topological concepts to NLP, offering a new perspective beyond traditional clustering methods for word sense induction.

Findings

01

Low relative error on word sense induction tasks

02

Demonstrates the effectiveness of topological algorithms in NLP

03

Encourages further research in topological data analysis for language processing

Abstract

We develop and test a novel unsupervised algorithm for word sense induction and disambiguation which uses topological data analysis. Typical approaches to the problem involve clustering, based on simple low level features of distance in word embeddings. Our approach relies on advanced mathematical concepts in the field of topology which provides a richer conceptualization of clusters for the word sense induction tasks. We use a persistent homology barcode algorithm on the SemCor dataset and demonstrate that our approach gives low relative error on word sense induction. This shows the promise of topological algorithms for natural language processing and we advocate for future work in this promising area.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Homotopy and Cohomology in Algebraic Topology