Bilingual Lexicon Induction with Semi-supervision in Non-Isometric   Embedding Spaces

Barun Patra; Joel Ruben Antony Moniz; Sarthak Garg; Matthew R.; Gormley; Graham Neubig

arXiv:1908.06625·cs.CL·August 20, 2019·5 cites

Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces

Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R., Gormley, Graham Neubig

PDF

Open Access 1 Repo

TL;DR

This paper introduces BLISS, a semi-supervised method for bilingual lexicon induction that relaxes the isometry assumption between embedding spaces, achieving state-of-the-art results especially when spaces are non-isometric.

Contribution

The paper presents a novel semi-supervised approach, BLISS, which effectively handles non-isometric embedding spaces and improves bilingual lexicon induction performance.

Findings

01

Achieves state-of-the-art results on 15 of 18 language pairs

02

Effectively handles non-isometric embedding spaces

03

Supervision stabilizes learning even with minimal data

Abstract

Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. We then propose Bilingual Lexicon Induction with Semi-Supervision (BLISS) --- a semi-supervised approach that relaxes the isometric assumption while leveraging both limited aligned bilingual lexicons and a larger set of unaligned word embeddings, as well as a novel hubness filtering technique. Our proposed method obtains state of the art results on 15 of 18 language pairs on the MUSE dataset, and does particularly well when the embedding spaces don't appear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joelmoniz/BLISS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies