InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language   Model Pre-Training

Zewen Chi; Li Dong; Furu Wei; Nan Yang; Saksham Singhal; Wenhui Wang,; Xia Song; Xian-Ling Mao; Heyan Huang; Ming Zhou

arXiv:2007.07834·cs.CL·April 8, 2021·77 cites

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang,, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou

PDF

Open Access 4 Repos 3 Models

TL;DR

This paper introduces an information-theoretic framework for cross-lingual language model pre-training, utilizing mutual information maximization and contrastive learning to enhance multilingual transferability and performance.

Contribution

It proposes a novel pre-training framework based on mutual information and contrastive learning, improving cross-lingual representations over existing methods.

Findings

01

Achieves better performance on multiple benchmarks

02

Effectively leverages monolingual and parallel corpora

03

Enhances cross-lingual transferability

Abstract

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · InfoNCE · Contrastive Multiview Coding · Residual Connection · Dropout · Byte Pair Encoding · Adam · Dense Connections