Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang, Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao,, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma, Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani

TL;DR
This paper demonstrates that explicit multitask prompted training on diverse datasets enables large language models to achieve strong zero-shot generalization across various tasks, often surpassing larger models.
Contribution
The authors introduce a system for converting natural language tasks into prompts and show that fine-tuning on this multitask prompted data improves zero-shot performance significantly.
Findings
Achieves strong zero-shot performance on standard datasets.
Outperforms larger models on multiple tasks.
Effective on BIG-bench subset.
Abstract
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗bigscience/T0model· 63 dl· ♡ 8363 dl♡ 83
- 🤗bigscience/T0_3Bmodel· 1.8k dl· ♡ 1021.8k dl♡ 102
- 🤗bigscience/T0_original_task_onlymodel· 11 dl· ♡ 111 dl♡ 1
- 🤗bigscience/T0_single_promptmodel· 12 dl· ♡ 412 dl♡ 4
- 🤗bigscience/T0pmodel· 20 dl· ♡ 520 dl♡ 5
- 🤗bigscience/T0ppmodel· 35k dl· ♡ 40335k dl♡ 403
- 🤗gustavecortal/T0_3B-8bitmodel· 3 dl· ♡ 103 dl♡ 10
- 🤗saurkulsh/T0ppmodel· 14 dl14 dl
- 🤗crumb/gpt-j-6b-finetune-super-gluemodel· 5 dl5 dl
- 🤗GroNLP/T0pp-shardedmodel· 2 dl· ♡ 52 dl♡ 5
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsTest
