Training a T5 Using Lab-sized Resources
Manuel R. Ciosici, Leon Derczynski

TL;DR
This paper introduces techniques and practical recommendations for training large language models like T5 with limited resources, demonstrated through a Danish T5 case study, making advanced NLP more accessible.
Contribution
It presents novel methods and guidelines enabling training of large models on modest resources, including a first Danish T5 model case study.
Findings
Achieved training of a large T5 model with limited resources
Provided practical recommendations for resource-constrained training
Developed the first Danish T5 model
Abstract
Training large neural language models on large datasets is resource- and time-intensive. These requirements create a barrier to entry, where those with fewer resources cannot build competitive models. This paper presents various techniques for making it possible to (a) train a large language model using resources that a modest research lab might have, and (b) train it in a reasonable amount of time. We provide concrete recommendations for practitioners, which we illustrate with a case study: a T5 model for Danish, the first for this language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Byte Pair Encoding · Gated Linear Unit · Dense Connections · SentencePiece · Inverse Square Root Schedule
