Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan, Lyu

TL;DR
This paper demonstrates a cost-effective method for training large-scale text-to-image diffusion models from scratch using innovative masking strategies, architecture improvements, and synthetic data, achieving competitive results at a fraction of typical costs.
Contribution
The authors introduce a novel low-cost training pipeline for large-scale T2I diffusion models, utilizing random patch masking, deferred masking, and synthetic data to drastically reduce expenses.
Findings
Achieved 12.7 FID on COCO with only $1,890 cost.
Reduced training cost by 118× compared to stable diffusion.
Demonstrated competitive quality with significantly lower resource requirements.
Abstract
As scaling laws in generative AI push performance, they also simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to address this bottleneck by demonstrating very low-cost training of large-scale T2I diffusion transformer models. As the computational cost of transformers increases with the number of patches in each image, we propose to randomly mask up to 75% of the image patches during training. We propose a deferred masking strategy that preprocesses all patches using a patch-mixer before masking, thus significantly reducing the performance degradation with masking, making it superior to model downscaling in reducing computational cost. We also incorporate the latest improvements in transformer architecture, such as the use of mixture-of-experts layers, to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Attention Dropout · Attention Is All You Need · Dense Connections · Linear Warmup With Cosine Annealing · Multi-Head Attention · Residual Connection · Dropout
