A Strong Baseline for Learning Cross-Lingual Word Embeddings from   Sentence Alignments

Omer Levy; Anders S{\o}gaard; Yoav Goldberg

arXiv:1608.05426·cs.CL·January 11, 2017·2 cites

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

Omer Levy, Anders S{\o}gaard, Yoav Goldberg

PDF

Open Access

TL;DR

This paper shows that many cross-lingual embedding algorithms perform similarly when using sentence ID features, and suggests that incorporating additional information sources could enhance future models.

Contribution

It provides empirical and theoretical analysis linking embedding and alignment methods, highlighting the importance of sentence ID features and proposing avenues for improvement.

Findings

01

Sentence ID features significantly impact performance

02

Traditional alignment algorithms perform comparably to embedding methods

03

Additional information sources could improve cross-lingual embeddings

Abstract

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Model-1, which demonstrate similar performance to state-of-the-art embedding algorithms on a variety of benchmarks. Overall, we observe that different algorithmic approaches for utilizing the sentence ID feature space result in similar performance. This paper draws both empirical and theoretical parallels between the embedding and alignment literature, and suggests that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification