Finetuned Language Models Are Zero-Shot Learners
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu,, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

TL;DR
This paper demonstrates that instruction tuning significantly enhances the zero-shot learning capabilities of large language models, outperforming GPT-3 on various NLP tasks by finetuning on diverse instruction-based datasets.
Contribution
The authors introduce FLAN, a method of instruction tuning large language models on multiple tasks, leading to substantial improvements in zero-shot and few-shot performance.
Findings
FLAN surpasses zero-shot GPT-3 on 20 of 25 tasks.
Instruction tuning improves performance more than increasing model size alone.
Number of datasets and natural language instructions are crucial for success.
Abstract
This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zirui3/flan-t5-xxl-oa_instructionmodel
- 🤗instruction-tuning-sd/cartoonizermodel· 68 dl· ♡ 7768 dl♡ 77
- 🤗instruction-tuning-sd/low-level-img-procmodel· 24 dl· ♡ 724 dl♡ 7
- 🤗instruction-tuning-sd/scratch-cartoonizermodel· 7 dl· ♡ 77 dl♡ 7
- 🤗instruction-tuning-sd/scratch-low-level-img-procmodel· 5 dl· ♡ 35 dl♡ 3
- 🤗Ahrefs/flan-llama-7b-deltamodel
- 🤗AnalogMutations/cartoonizermodel· 11 dl· ♡ 411 dl♡ 4
- 🤗LLM360/K2-Chatmodel· 301 dl· ♡ 37301 dl♡ 37
- 🤗Zoyd/LLM360_K2-Chat-2_2bpw_exl2model
- 🤗Zoyd/LLM360_K2-Chat-2_5bpw_exl2model
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Dropout · Softmax · Attention Dropout · {Dispute@FaQ-s}How to file a dispute with Expedia?
