Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces
Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R., Gormley, Graham Neubig

TL;DR
This paper introduces BLISS, a semi-supervised method for bilingual lexicon induction that relaxes the isometry assumption between embedding spaces, achieving state-of-the-art results especially when spaces are non-isometric.
Contribution
The paper presents a novel semi-supervised approach, BLISS, which effectively handles non-isometric embedding spaces and improves bilingual lexicon induction performance.
Findings
Achieves state-of-the-art results on 15 of 18 language pairs
Effectively handles non-isometric embedding spaces
Supervision stabilizes learning even with minimal data
Abstract
Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. We then propose Bilingual Lexicon Induction with Semi-Supervision (BLISS) --- a semi-supervised approach that relaxes the isometric assumption while leveraging both limited aligned bilingual lexicons and a larger set of unaligned word embeddings, as well as a novel hubness filtering technique. Our proposed method obtains state of the art results on 15 of 18 language pairs on the MUSE dataset, and does particularly well when the embedding spaces don't appear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
