Language-agnostic BERT Sentence Embedding
Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang

TL;DR
This paper introduces a multilingual BERT-based sentence embedding model that significantly reduces the need for parallel data, achieving high accuracy across 112 languages and enabling effective cross-lingual tasks.
Contribution
It presents a novel approach combining multiple methods to create a language-agnostic BERT sentence embedding model with state-of-the-art performance.
Findings
Achieves 83.7% bi-text retrieval accuracy over 112 languages
Reduces parallel data requirements by 80% using pre-trained multilingual models
Enables training competitive NMT models using mined parallel data
Abstract
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored. We systematically investigate methods for learning multilingual sentence embeddings by combining the best methods for learning monolingual and cross-lingual representations including: masked language modeling (MLM), translation language modeling (TLM) (Conneau and Lample, 2019), dual encoder translation ranking (Guo et al., 2018), and additive margin softmax (Yang et al., 2019a). We show that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by 80%. Composing the best of these methods produces a model that achieves 83.7% bi-text retrieval accuracy over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗cointegrated/LaBSE-en-rumodel· 9.2k dl· ♡ 599.2k dl♡ 59
- 🤗setu4993/LaBSEmodel· 13k dl· ♡ 5413k dl♡ 54
- 🤗setu4993/smaller-LaBSEmodel· 638 dl· ♡ 13638 dl♡ 13
- 🤗EIStakovskii/LaBSE-fr-demodel· 1 dl1 dl
- 🤗Blaxzter/LaBSE-sentence-embeddingsmodel· 28 dl· ♡ 1928 dl♡ 19
- 🤗michaelfeil/ct2fast-LaBSEmodel· 358 dl· ♡ 2358 dl♡ 2
- 🤗sartifyllc/African-Cross-Lingua-Embeddings-Modelmodel· 19 dl· ♡ 219 dl♡ 2
- 🤗Solomennikova/labse_funetuned_catsmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Multi-Head Attention · Residual Connection · Attention Is All You Need · Attention Dropout · Weight Decay · Adam · Softmax · WordPiece · Dense Connections
