nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style   Models with Limited Resources

Piotr Nawrot

arXiv:2309.02373·cs.CL·October 25, 2023

nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

Piotr Nawrot

PDF

Open Access 1 Repo 1 Models

TL;DR

nanoT5 is an optimized PyTorch framework enabling efficient pre-training and fine-tuning of T5 models on limited hardware, significantly reducing computational costs while maintaining performance.

Contribution

It introduces nanoT5, a resource-efficient framework for T5 models, allowing pre-training on a single GPU in 16 hours without performance loss.

Findings

01

Pre-training T5-Base on a single GPU in 16 hours

02

Maintains performance comparable to traditional training methods

03

Provides open-source code, configurations, and pre-trained models

Abstract

State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

piotrnawrot/nanot5
pytorchOfficial

Models

🤗
pnawrot/nanoT5-base
model· 4 dl· ♡ 11
4 dl♡ 11

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Inverse Square Root Schedule · Byte Pair Encoding · SentencePiece · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?