Improving Bilingual Lexicon Induction with Cross-Encoder Reranking
Yaoyiran Li, Fangyu Liu, Ivan Vuli\'c, Anna Korhonen

TL;DR
This paper introduces BLICEr, a semi-supervised reranking method that enhances bilingual lexicon induction by combining cross-encoder similarity scores from multilingual models with traditional CLWE-based methods, achieving state-of-the-art results.
Contribution
The paper proposes a novel semi-supervised reranking approach that extracts cross-lingual lexical knowledge from mPLMs and combines it with existing CLWEs, improving BLI performance.
Findings
BLICEr outperforms strong baselines on standard BLI benchmarks.
The method is robust across different CLWE spaces.
State-of-the-art results achieved on diverse language pairs.
Abstract
Bilingual lexicon induction (BLI) with limited bilingual supervision is a crucial yet challenging task in multilingual NLP. Current state-of-the-art BLI methods rely on the induction of cross-lingual word embeddings (CLWEs) to capture cross-lingual word similarities; such CLWEs are obtained 1) via traditional static models (e.g., VecMap), or 2) by extracting type-level CLWEs from multilingual pretrained language models (mPLMs), or 3) through combining the former two options. In this work, we propose a novel semi-supervised post-hoc reranking method termed BLICEr (BLI with Cross-Encoder Reranking), applicable to any precalculated CLWE space, which improves their BLI capability. The key idea is to 'extract' cross-lingual lexical knowledge from mPLMs, and then combine it with the original CLWEs. This crucial step is done via 1) creating a word similarity dataset, comprising positive word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsXLM-R · Cross-encoder Reranking
