InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang,, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou

TL;DR
This paper introduces an information-theoretic framework for cross-lingual language model pre-training, utilizing mutual information maximization and contrastive learning to enhance multilingual transferability and performance.
Contribution
It proposes a novel pre-training framework based on mutual information and contrastive learning, improving cross-lingual representations over existing methods.
Findings
Achieves better performance on multiple benchmarks
Effectively leverages monolingual and parallel corpora
Enhances cross-lingual transferability
Abstract
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · InfoNCE · Contrastive Multiview Coding · Residual Connection · Dropout · Byte Pair Encoding · Adam · Dense Connections
