Few-shot Learning with Multilingual Language Models

Xi Victoria Lin; Todor Mihaylov; Mikel Artetxe; Tianlu Wang; Shuohui; Chen; Daniel Simig; Myle Ott; Naman Goyal; Shruti Bhosale; Jingfei Du,; Ramakanth Pasunuru; Sam Shleifer; Punit Singh Koura; Vishrav Chaudhary; Brian; O'Horo; Jeff Wang; Luke Zettlemoyer; Zornitsa Kozareva; Mona Diab; Veselin; Stoyanov; Xian Li

arXiv:2112.10668·cs.CL·November 11, 2022·76 cites

Few-shot Learning with Multilingual Language Models

Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui, Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du,, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian, O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva

PDF

Open Access 2 Repos 10 Models 4 Datasets 1 Video

TL;DR

This paper trains large multilingual language models to improve few- and zero-shot learning across diverse languages, achieving state-of-the-art results in multilingual reasoning, inference, and translation tasks, while analyzing cross-lingual transfer methods.

Contribution

It introduces a large-scale multilingual generative language model that outperforms GPT-3 in multiple multilingual tasks and provides insights into effective cross-lingual prompting strategies.

Findings

01

State-of-the-art few-shot performance in over 20 languages.

02

Outperforms GPT-3 in multilingual reasoning and inference.

03

Excels in machine translation with limited training examples.

Abstract

Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

[ML Olds] Meta Research Supercluster | OpenAI GPT-Instruct | Google LaMDA | Drones fight Pigeons· youtube

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Attention Dropout · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Linear Warmup With Cosine Annealing