A Survey of Word Embeddings Evaluation Methods

Amir Bakarov

arXiv:1801.09536·cs.CL·January 30, 2018·131 cites

A Survey of Word Embeddings Evaluation Methods

Amir Bakarov

PDF

Open Access 2 Repos

TL;DR

This paper provides a comprehensive overview of evaluation methods for word embeddings, categorizing and analyzing 28 different intrinsic and extrinsic approaches, and discussing key challenges in the field.

Contribution

It offers the first systematic typology of word embedding evaluation methods, summarizing existing techniques and highlighting open problems and challenges.

Findings

01

16 intrinsic evaluation methods summarized

02

12 extrinsic evaluation methods summarized

03

Discussion of key challenges in evaluation

Abstract

Word embeddings are real-valued word representations able to capture lexical semantics and trained on natural language corpora. Models proposing these representations have gained popularity in the recent years, but the issue of the most adequate evaluation method still remains open. This paper presents an extensive overview of the field of word embeddings evaluation, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods. I describe both widely-used and experimental methods, systematize information about evaluation datasets and discuss some key challenges.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling