Exploring the Limits of Transfer Learning with a Unified Text-to-Text   Transformer

Colin Raffel; Noam Shazeer; Adam Roberts; Katherine Lee; Sharan; Narang; Michael Matena; Yanqi Zhou; Wei Li; Peter J. Liu

arXiv:1910.10683·cs.LG·September 20, 2023·3.7k cites

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan, Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

PDF

Open Access 5 Repos 10 Models 5 Datasets 1 Video

TL;DR

This paper introduces a unified text-to-text framework for transfer learning in NLP, systematically comparing various approaches and achieving state-of-the-art results across multiple language understanding tasks.

Contribution

It presents a comprehensive, unified approach to transfer learning in NLP, including a new data set and models, and demonstrates improved performance on diverse benchmarks.

Findings

01

Unified text-to-text framework outperforms previous methods

02

Achieved state-of-the-art results on multiple NLP benchmarks

03

Released new data set, models, and code for future research

Abstract

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer· youtube

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Gated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Multi-Head Attention · Adafactor · Residual Connection · Inverse Square Root Schedule · Attention Dropout · SentencePiece