Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James, Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz, Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee,, Jacob Austin, Sebastian Goodman, Livio Baldini Soares

TL;DR
This paper introduces two open-source libraries, t5x and seqio, that facilitate scaling large language models and data pipelines, enabling training of models with hundreds of billions of parameters efficiently and reproducibly.
Contribution
The paper presents t5x and seqio, new software tools that simplify building, training, and managing large-scale language models and datasets.
Findings
Trained models with hundreds of billions of parameters.
Enabled efficient handling of multi-terabyte datasets.
Provided configurations for T5-like and GPT-like architectures.
Abstract
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: simplifies the process of building and training large language models at scale while maintaining ease of use, and provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications
