Training Transformers Together
Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile, Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf

TL;DR
This paper demonstrates a collaborative approach to training a text-to-image transformer model over the Internet, addressing engineering challenges and enabling distributed contributions from multiple parties.
Contribution
It introduces a practical framework for large-scale collaborative training of transformers, including handling communication, memory, performance, and security issues.
Findings
The trained model produces reasonably high-quality images.
The collaborative training process is feasible with proper engineering solutions.
Viewers can successfully contribute hardware to ongoing training.
Abstract
The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Visualization and Analytics · Advanced Data Storage Technologies
