Analogies minus analogy test: measuring regularities in word embeddings

Louis Fournier; Emmanuel Dupoux; Ewan Dunbar

arXiv:2010.03446·cs.CL·October 8, 2020

Analogies minus analogy test: measuring regularities in word embeddings

Louis Fournier, Emmanuel Dupoux, Ewan Dunbar

PDF

1 Repo

TL;DR

This paper critically analyzes the traditional word analogy test, introduces two new metrics to better measure linguistic regularities in word embeddings, and demonstrates that many embeddings still encode these regularities despite flaws in the standard test.

Contribution

It proposes two novel metrics to address issues with the classic analogy test and provides empirical evidence that popular embeddings encode linguistic regularities.

Findings

01

Standard analogy test is flawed but embeddings still encode regularities.

02

Two new metrics effectively distinguish different types of regularities.

03

Popular embeddings show strong class-wise offset concentration and pairing consistency.

Abstract

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France--London, China--Ottawa, ...) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bootphon/measuring-regularities-in-word-embeddings
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.