nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources
Piotr Nawrot

TL;DR
nanoT5 is an optimized PyTorch framework enabling efficient pre-training and fine-tuning of T5 models on limited hardware, significantly reducing computational costs while maintaining performance.
Contribution
It introduces nanoT5, a resource-efficient framework for T5 models, allowing pre-training on a single GPU in 16 hours without performance loss.
Findings
Pre-training T5-Base on a single GPU in 16 hours
Maintains performance comparable to traditional training methods
Provides open-source code, configurations, and pre-trained models
Abstract
State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Inverse Square Root Schedule · Byte Pair Encoding · SentencePiece · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
