mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko, Alena Fenogenova, Maria Tikhonova, Vladislav, Mikhailov, Anastasia Kozlova, Tatiana Shavrina

TL;DR
This paper introduces two multilingual autoregressive GPT-like models trained on 60 languages, demonstrating strong performance on various NLP tasks and making them accessible for low-resource languages.
Contribution
The paper presents new large-scale multilingual GPT-like models with detailed architecture, training pipeline, and evaluation, expanding NLP capabilities for underrepresented languages.
Findings
Models perform on par with XGLM on multilingual tasks.
Effective training with sparse attention and parallel frameworks.
Models show strong zero- and few-shot learning capabilities.
Abstract
Recent studies report that autoregressive language models can successfully solve many NLP tasks via zero- and few-shot learning paradigms, which opens up new possibilities for using the pre-trained language models. This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism; Deepspeed and Megatron frameworks allow us to parallelize the training and inference steps effectively. The resulting models show performance on par with the recently released XGLM models by Facebook, covering more languages and enhancing NLP possibilities for low resource languages of CIS countries and Russian small nations. We detail the motivation for the choices of the architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ai-forever/mGPTmodel· 6.9k dl· ♡ 2706.9k dl♡ 270
- 🤗ai-forever/mGPT-armenianmodel· 20 dl· ♡ 820 dl♡ 8
- 🤗simbolo-ai/Myanmarsar-GPTmodel· 46 dl· ♡ 546 dl♡ 5
- 🤗DFofanov78/mGPTmodel· 2 dl2 dl
- 🤗RichardErkhov/simbolo-ai_-_Myanmarsar-GPT-4bitsmodel
- 🤗RichardErkhov/simbolo-ai_-_Myanmarsar-GPT-8bitsmodel
- 🤗Burman-AI/mGPT-rumodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections
