Reasoning about Linguistic Regularities in Word Embeddings using Matrix Manifolds
Sridhar Mahadevan, Sarath Chandar

TL;DR
This paper introduces a novel method for capturing linguistic analogies in word embeddings by modeling subspaces on Grassmannian manifolds, leading to improved analogy task performance.
Contribution
It proposes a new approach using subspace modeling on Grassmannian manifolds and geodesic kernels to better capture word relations in embeddings.
Findings
Significantly better performance on analogy tasks.
Effective modeling of relation-specific distances.
Improved understanding of semantic regularities.
Abstract
Recent work has explored methods for learning continuous vector space word representations reflecting the underlying semantics of words. Simple vector space arithmetic using cosine distances has been shown to capture certain types of analogies, such as reasoning about plurals from singulars, past tense from present tense, etc. In this paper, we introduce a new approach to capture analogies in continuous word representations, based on modeling not just individual word vectors, but rather the subspaces spanned by groups of words. We exploit the property that the set of subspaces in n-dimensional Euclidean space form a curved manifold space called the Grassmannian, a quotient subgroup of the Lie group of rotations in n- dimensions. Based on this mathematical model, we develop a modified cosine distance model based on geodesic kernels that captures relation-specific distances across word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
