mGPT: Few-Shot Learners Go Multilingual

Oleh Shliazhko; Alena Fenogenova; Maria Tikhonova; Vladislav; Mikhailov; Anastasia Kozlova; Tatiana Shavrina

arXiv:2204.07580·cs.CL·October 13, 2023·47 cites

mGPT: Few-Shot Learners Go Multilingual

Oleh Shliazhko, Alena Fenogenova, Maria Tikhonova, Vladislav, Mikhailov, Anastasia Kozlova, Tatiana Shavrina

PDF

Open Access 1 Repo 7 Models

TL;DR

This paper introduces two multilingual autoregressive GPT-like models trained on 60 languages, demonstrating strong performance on various NLP tasks and making them accessible for low-resource languages.

Contribution

The paper presents new large-scale multilingual GPT-like models with detailed architecture, training pipeline, and evaluation, expanding NLP capabilities for underrepresented languages.

Findings

01

Models perform on par with XGLM on multilingual tasks.

02

Effective training with sparse attention and parallel frameworks.

03

Models show strong zero- and few-shot learning capabilities.

Abstract

Recent studies report that autoregressive language models can successfully solve many NLP tasks via zero- and few-shot learning paradigms, which opens up new possibilities for using the pre-trained language models. This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism; Deepspeed and Megatron frameworks allow us to parallelize the training and inference steps effectively. The resulting models show performance on par with the recently released XGLM models by Facebook, covering more languages and enhancing NLP possibilities for low resource languages of CIS countries and Russian small nations. We detail the motivation for the choices of the architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-forever/mgpt
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections