Should All Cross-Lingual Embeddings Speak English?

Antonios Anastasopoulos; Graham Neubig

arXiv:1911.03058·cs.CL·April 7, 2020

Should All Cross-Lingual Embeddings Speak English?

Antonios Anastasopoulos, Graham Neubig

PDF

1 Repo

TL;DR

This paper critically examines the Anglocentric bias in cross-lingual embeddings, demonstrating the impact of hub language choice, expanding evaluation datasets, and proposing guidelines for more inclusive and effective multilingual embeddings.

Contribution

It challenges the default English hub assumption, expands evaluation datasets to include all language pairs, and provides guidelines for better cross-lingual embedding practices.

Findings

01

Hub language choice significantly affects performance.

02

Expanded evaluation datasets reveal new challenges.

03

Guidelines for robust multilingual embeddings.

Abstract

Most of recent work in cross-lingual word embeddings is severely Anglocentric. The vast majority of lexicon induction evaluation dictionaries are between English and another language, and the English embedding space is selected by default as the hub when learning in a multilingual setting. With this work, however, we challenge these practices. First, we show that the choice of hub language can significantly impact downstream lexicon induction performance. Second, we both expand the current evaluation dictionary collection to include all language pairs using triangulation, and also create new dictionaries for under-represented languages. Evaluating established methods over all these language pairs sheds light into their suitability and presents new challenges for the field. Finally, in our analysis we identify general guidelines for strong cross-lingual embeddings baselines, based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

antonisa/embeddings
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.