TL;DR
This paper investigates the stability of word embeddings, revealing that even common words can be unstable and examining how this impacts their effectiveness in downstream applications.
Contribution
It provides empirical analysis of factors affecting embedding stability and explores the implications for downstream tasks, addressing a gap in understanding embedding limitations.
Findings
High-frequency words often exhibit instability.
Various factors influence embedding stability.
Stability impacts downstream task performance.
Abstract
Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
