Evaluating Word Embeddings with Categorical Modularity

S\'ilvia Casacuberta; Karina Halevy; Dami\'an E. Blasi

arXiv:2106.00877·cs.CL·June 3, 2021

Evaluating Word Embeddings with Categorical Modularity

S\'ilvia Casacuberta, Karina Halevy, Dami\'an E. Blasi

PDF

1 Repo

TL;DR

This paper introduces categorical modularity, a new low-resource intrinsic metric based on graph modularity, to evaluate word embedding quality across languages and models, showing correlations with downstream tasks.

Contribution

The paper proposes categorical modularity as a novel intrinsic evaluation metric for word embeddings, applicable across multiple languages and models, with demonstrated predictive power for downstream tasks.

Findings

01

Moderate to strong correlation with sentiment analysis and word similarity tasks.

02

Predictive of cross-lingual bilingual lexicon induction performance.

03

Provides insights into semantic information loss in embeddings.

Abstract

We introduce categorical modularity, a novel low-resource intrinsic metric to evaluate word embedding quality. Categorical modularity is a graph modularity metric based on the $k$ -nearest neighbor graph constructed with embedding vectors of words from a fixed set of semantic categories, in which the goal is to measure the proportion of words that have nearest neighbors within the same categories. We use a core set of 500 words belonging to 59 neurobiologically motivated semantic categories in 29 languages and analyze three word embedding models per language (FastText, MUSE, and subs2vec). We find moderate to strong positive correlations between categorical modularity and performance on the monolingual tasks of sentiment analysis and word similarity calculation and on the cross-lingual task of bilingual lexicon induction both to and from English. Overall, we suggest that categorical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

enscma2/categorical-modularity
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCategorical Modularity · Support Vector Machine · fastText · Linear Regression