Language Models are Few-shot Multilingual Learners

Genta Indra Winata; Andrea Madotto; Zhaojiang Lin; Rosanne Liu; Jason; Yosinski; Pascale Fung

arXiv:2109.07684·cs.CL·September 17, 2021

Language Models are Few-shot Multilingual Learners

Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason, Yosinski, Pascale Fung

PDF

1 Repo

TL;DR

This paper demonstrates that large pre-trained language models like GPT and T5 can perform few-shot classification tasks across multiple languages without additional training, achieving competitive results in cross-lingual settings.

Contribution

It provides empirical evidence that multilingual capabilities emerge in large language models through few-shot learning, without explicit multilingual training.

Findings

01

Models can classify non-English samples using English examples as context.

02

Few-shot cross-lingual performance surpasses random chance.

03

Results are competitive with specialized cross-lingual models.

Abstract

General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples. Here, we evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages without any parameter updates. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones. Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art cross-lingual models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gentaiscool/few-shot-lm
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Test · Linear Layer · Weight Decay · Discriminative Fine-Tuning · SentencePiece · Cosine Annealing · Dropout