Topology of Word Embeddings: Singularities Reflect Polysemy
Alexander Jakubowski, Milica Ga\v{s}i\'c, Marcus Zibrowius

TL;DR
This paper explores the topological structure of word embeddings, revealing that polysemous words correspond to singular points on a manifold, and introduces topological measures to distinguish word meanings.
Contribution
It proposes a novel topological framework for understanding word embeddings, linking singularities to polysemy, and offers empirical methods for measuring and disambiguating word senses.
Findings
Topological measure correlates with number of word meanings
Singular points in embeddings indicate polysemy
Topologically motivated approach achieves competitive results in word sense disambiguation
Abstract
The manifold hypothesis suggests that word vectors live on a submanifold within their ambient vector space. We argue that we should, more accurately, expect them to live on a pinched manifold: a singular quotient of a manifold obtained by identifying some of its points. The identified, singular points correspond to polysemous words, i.e. words with multiple meanings. Our point of view suggests that monosemous and polysemous words can be distinguished based on the topology of their neighbourhoods. We present two kinds of empirical evidence to support this point of view: (1) We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word. (2) We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation that produces competitive results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Natural Language Processing Techniques · Topic Modeling
