Factors Influencing the Surprising Instability of Word Embeddings

Laura Wendlandt; Jonathan K. Kummerfeld; Rada Mihalcea

arXiv:1804.09692·cs.CL·June 5, 2020

Factors Influencing the Surprising Instability of Word Embeddings

Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea

PDF

2 Repos

TL;DR

This paper investigates the stability of word embeddings, revealing that even common words can be unstable and examining how this impacts their effectiveness in downstream applications.

Contribution

It provides empirical analysis of factors affecting embedding stability and explores the implications for downstream tasks, addressing a gap in understanding embedding limitations.

Findings

01

High-frequency words often exhibit instability.

02

Various factors influence embedding stability.

03

Stability impacts downstream task performance.

Abstract

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.