Cross-Lingual Supervision improves Large Language Models Pre-training

Andrea Schioppa; Xavier Garcia; Orhan Firat

arXiv:2305.11778·cs.CL·May 22, 2023·5 cites

Cross-Lingual Supervision improves Large Language Models Pre-training

Andrea Schioppa, Xavier Garcia, Orhan Firat

PDF

Open Access

TL;DR

This paper shows that combining self-supervised language modeling with cross-lingual supervised machine translation during pre-training enhances large language models' in-context learning abilities, using a novel adaptive mixing strategy.

Contribution

It introduces a method to incorporate cross-lingual supervision into large language model pre-training and proposes an adaptive strategy to optimize the mixing ratio of objectives.

Findings

01

Models with combined objectives outperform purely self-supervised models in in-context learning.

02

The adaptive mixing strategy effectively balances objectives without extensive grid search.

03

Cross-lingual data inclusion improves multilingual understanding and transfer capabilities.

Abstract

The recent rapid progress in pre-training Large Language Models has relied on using self-supervised language modeling objectives like next token prediction or span corruption. On the other hand, Machine Translation Systems are mostly trained using cross-lingual supervision that requires aligned data between source and target languages. We demonstrate that pre-training Large Language Models on a mixture of a self-supervised Language Modeling objective and the supervised Machine Translation objective, therefore including cross-lingual parallel data during pre-training, yields models with better in-context learning abilities. As pre-training is a very resource-intensive process and a grid search on the best mixing ratio between the two objectives is prohibitively expensive, we propose a simple yet effective strategy to learn it during pre-training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification