Cross-lingual Models of Word Embeddings: An Empirical Comparison

Shyam Upadhyay; Manaal Faruqui; Chris Dyer; Dan Roth

arXiv:1604.00425·cs.CL·June 9, 2016·44 cites

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth

PDF

Open Access 1 Repo

TL;DR

This paper systematically compares four methods for creating cross-lingual word embeddings across multiple language pairs and tasks, revealing trade-offs between supervision cost and performance.

Contribution

It provides an extensive empirical evaluation of different cross-lingual embedding approaches across various tasks and language pairs, highlighting their relative strengths and weaknesses.

Findings

01

Supervised models generally outperform unsupervised ones.

02

Cheaply supervised models are competitive on some tasks.

03

Performance varies significantly across tasks and language pairs.

Abstract

Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shyamupa/biling-survey
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification