Language-agnostic BERT Sentence Embedding

Fangxiaoyu Feng; Yinfei Yang; Daniel Cer; Naveen Arivazhagan; Wei Wang

arXiv:2007.01852·cs.CL·March 9, 2022·195 cites

Language-agnostic BERT Sentence Embedding

Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang

PDF

Open Access 5 Repos 8 Models 4 Datasets

TL;DR

This paper introduces a multilingual BERT-based sentence embedding model that significantly reduces the need for parallel data, achieving high accuracy across 112 languages and enabling effective cross-lingual tasks.

Contribution

It presents a novel approach combining multiple methods to create a language-agnostic BERT sentence embedding model with state-of-the-art performance.

Findings

01

Achieves 83.7% bi-text retrieval accuracy over 112 languages

02

Reduces parallel data requirements by 80% using pre-trained multilingual models

03

Enables training competitive NMT models using mined parallel data

Abstract

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored. We systematically investigate methods for learning multilingual sentence embeddings by combining the best methods for learning monolingual and cross-lingual representations including: masked language modeling (MLM), translation language modeling (TLM) (Conneau and Lample, 2019), dual encoder translation ranking (Guo et al., 2018), and additive margin softmax (Yang et al., 2019a). We show that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by 80%. Composing the best of these methods produces a model that achieves 83.7% bi-text retrieval accuracy over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsLinear Layer · Multi-Head Attention · Residual Connection · Attention Is All You Need · Attention Dropout · Weight Decay · Adam · Softmax · WordPiece · Dense Connections