Are Word Embedding Methods Stable and Should We Care About It?

Angana Borah; Manash Pratim Barman; Amit Awekar

arXiv:2104.08433·cs.CL·June 13, 2024

Are Word Embedding Methods Stable and Should We Care About It?

Angana Borah, Manash Pratim Barman, Amit Awekar

PDF

TL;DR

This paper investigates the stability of popular word embedding methods across different datasets and parameters, revealing that fastText is the most stable among Word2Vec, GloVe, and fastText, and examines implications for downstream tasks.

Contribution

It introduces a systematic stability measurement for WEMs using intrinsic evaluation and analyzes how stability varies with parameters and datasets.

Findings

01

fastText is the most stable WEM among the three studied

02

Stability varies significantly with training parameters and datasets

03

Stability impacts downstream tasks like clustering and POS tagging

Abstract

A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on three downstream tasks: Clustering, POS tagging, and Fairness evaluation. Our experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGloVe Embeddings · fastText