Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Manaal Faruqui; Yulia Tsvetkov; Pushpendre Rastogi; Chris Dyer

arXiv:1605.02276·cs.CL·June 23, 2016

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer

PDF

1 Repo

TL;DR

This paper critically examines the limitations of using word similarity tasks for evaluating word embeddings, highlighting issues and advocating for the development of more reliable evaluation methods.

Contribution

It identifies key problems with current word similarity evaluations and summarizes potential solutions, emphasizing the need for improved evaluation standards.

Findings

01

Word similarity tasks are unreliable for evaluating word vectors.

02

Current evaluation methods lack standardization and robustness.

03

The paper calls for new approaches to intrinsic evaluation of word embeddings.

Abstract

Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper we present several problems associated with the evaluation of word vectors on word similarity datasets, and summarize existing solutions. Our study suggests that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avi-jit/SWOW-eval
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.