Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan, Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

TL;DR
This paper introduces a unified text-to-text framework for transfer learning in NLP, systematically comparing various approaches and achieving state-of-the-art results across multiple language understanding tasks.
Contribution
It presents a comprehensive, unified approach to transfer learning in NLP, including a new data set and models, and demonstrates improved performance on diverse benchmarks.
Findings
Unified text-to-text framework outperforms previous methods
Achieved state-of-the-art results on multiple NLP benchmarks
Released new data set, models, and code for future research
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗autogluon/chronos-bolt-smallmodel· 10.3M dl· ♡ 2710.3M dl♡ 27
- 🤗amazon/chronos-bolt-basemodel· 6.7M dl· ♡ 816.7M dl♡ 81
- 🤗google/t5-v1_1-smallmodel· 38k dl· ♡ 2838k dl♡ 28
- 🤗chatpig/encodermodel· 3.4k dl· ♡ 323.4k dl♡ 32
- 🤗StentorLabs/Stentor2-12Mmodel· 124 dl· ♡ 2124 dl♡ 2
- 🤗JDBN/t5-base-fr-qg-fquadmodel· 45 dl· ♡ 545 dl♡ 5
- 🤗Modfiededition/t5-base-fine-tuned-on-jflegmodel· 11 dl· ♡ 811 dl♡ 8
- 🤗Narrativa/byt5-base-finetuned-tweet-qamodel· 6 dl· ♡ 26 dl♡ 2
- 🤗Narrativa/byt5-base-tweet-hate-detectionmodel· 324 dl· ♡ 10324 dl♡ 10
- 🤗Rachneet/t5-base-qg-hl-squadv2model· 8 dl8 dl
Videos
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Gated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Multi-Head Attention · Adafactor · Residual Connection · Inverse Square Root Schedule · Attention Dropout · SentencePiece
