Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

Ivan Vuli\'c; Goran Glava\v{s}; Roi Reichart; Anna Korhonen

arXiv:1909.01638·cs.CL·September 5, 2019

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

Ivan Vuli\'c, Goran Glava\v{s}, Roi Reichart, Anna Korhonen

PDF

1 Repo

TL;DR

This paper critically evaluates fully unsupervised cross-lingual word embedding methods, revealing their limitations in resource-poor and distant language pairs, and compares their performance to weakly supervised approaches.

Contribution

It provides a comprehensive empirical analysis showing that fully unsupervised CLWE methods often fail or underperform compared to weakly supervised methods in challenging language pairs.

Findings

01

Fully unsupervised CLWE often yields zero performance for many language pairs.

02

Weakly supervised methods outperform unsupervised ones in all tested scenarios.

03

Unsupervised methods do not surpass the performance of seeded approaches with 500-1,000 translation pairs.

Abstract

Recent efforts in cross-lingual word embedding (CLWE) learning have predominantly focused on fully unsupervised approaches that project monolingual embeddings into a shared cross-lingual space without any cross-lingual signal. The lack of any supervision makes such approaches conceptually attractive. Yet, their only core difference from (weakly) supervised projection-based CLWE methods is in the way they obtain a seed dictionary used to initialize an iterative self-learning procedure. The fully unsupervised methods have arguably become more robust, and their primary use case is CLWE induction for pairs of resource-poor and distant languages. In this paper, we question the ability of even the most robust unsupervised CLWE approaches to induce meaningful CLWEs in these more challenging settings. A series of bilingual lexicon induction (BLI) experiments with 15 diverse languages (210…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ivulic/panlex-bli
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.