Few-shot Learning with Multilingual Language Models
Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui, Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du,, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian, O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva

TL;DR
This paper trains large multilingual language models to improve few- and zero-shot learning across diverse languages, achieving state-of-the-art results in multilingual reasoning, inference, and translation tasks, while analyzing cross-lingual transfer methods.
Contribution
It introduces a large-scale multilingual generative language model that outperforms GPT-3 in multiple multilingual tasks and provides insights into effective cross-lingual prompting strategies.
Findings
State-of-the-art few-shot performance in over 20 languages.
Outperforms GPT-3 in multilingual reasoning and inference.
Excels in machine translation with limited training examples.
Abstract
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/xglm-4.5Bmodel· 1.2k dl· ♡ 211.2k dl♡ 21
- 🤗facebook/xglm-1.7Bmodel· 1.3k dl· ♡ 201.3k dl♡ 20
- 🤗facebook/xglm-2.9Bmodel· 522 dl· ♡ 10522 dl♡ 10
- 🤗facebook/xglm-564Mmodel· 173k dl· ♡ 54173k dl♡ 54
- 🤗facebook/xglm-7.5Bmodel· 1.7k dl· ♡ 601.7k dl♡ 60
- 🤗ai-forever/mGPTmodel· 6.9k dl· ♡ 2706.9k dl♡ 270
- 🤗ai-forever/mGPT-armenianmodel· 20 dl· ♡ 820 dl♡ 8
- 🤗osiria/diablo-italian-chatbot-1.3bmodel· 14 dl14 dl
- 🤗osiria/diablo-italian-base-354mmodel· 9 dl9 dl
- 🤗osiria/diablo-italian-base-1.3bmodel· 11 dl11 dl
Videos
[ML Olds] Meta Research Supercluster | OpenAI GPT-Instruct | Google LaMDA | Drones fight Pigeons· youtube
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Attention Dropout · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Linear Warmup With Cosine Annealing
