Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh; Albert Webson; Colin Raffel; Stephen H. Bach; Lintang; Sutawika; Zaid Alyafeai; Antoine Chaffin; Arnaud Stiegler; Teven Le Scao,; Arun Raja; Manan Dey; M Saiful Bari; Canwen Xu; Urmish Thakker; Shanya Sharma; Sharma; Eliza Szczechla; Taewoon Kim; Gunjan Chhablani; Nihal Nayak,; Debajyoti Datta; Jonathan Chang; Mike Tian-Jian Jiang; Han Wang; Matteo; Manica; Sheng Shen; Zheng Xin Yong; Harshit Pandey; Rachel Bawden; Thomas; Wang; Trishala Neeraj; Jos Rozen; Abheesht Sharma; Andrea Santilli; Thibault; Fevry; Jason Alan Fries; Ryan Teehan; Tali Bers; Stella Biderman; Leo Gao,; Thomas Wolf; Alexander M. Rush

arXiv:2110.08207·cs.LG·March 18, 2022·561 cites

Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang, Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao,, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma, Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani

PDF

Open Access 5 Repos 10 Models 3 Datasets 2 Videos

TL;DR

This paper demonstrates that explicit multitask prompted training on diverse datasets enables large language models to achieve strong zero-shot generalization across various tasks, often surpassing larger models.

Contribution

The authors introduce a system for converting natural language tasks into prompts and show that fine-tuning on this multitask prompted data improves zero-shot performance significantly.

Findings

01

Achieves strong zero-shot performance on standard datasets.

02

Outperforms larger models on multiple tasks.

03

Effective on BIG-bench subset.

Abstract

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

T0: Multitask Prompted Training Enables Zero-Shot Task Generalization | Paper Explained· youtube

Multitask Prompted Training Enables Zero-Shot Task Generalization· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsTest