Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth

TL;DR
This paper systematically compares four methods for creating cross-lingual word embeddings across multiple language pairs and tasks, revealing trade-offs between supervision cost and performance.
Contribution
It provides an extensive empirical evaluation of different cross-lingual embedding approaches across various tasks and language pairs, highlighting their relative strengths and weaknesses.
Findings
Supervised models generally outperform unsupervised ones.
Cheaply supervised models are competitive on some tasks.
Performance varies significantly across tasks and language pairs.
Abstract
Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
