Stretching Each Dollar: Diffusion Training from Scratch on a   Micro-Budget

Vikash Sehwag; Xianghao Kong; Jingtao Li; Michael Spranger; Lingjuan; Lyu

arXiv:2407.15811·cs.CV·July 23, 2024

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan, Lyu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper demonstrates a cost-effective method for training large-scale text-to-image diffusion models from scratch using innovative masking strategies, architecture improvements, and synthetic data, achieving competitive results at a fraction of typical costs.

Contribution

The authors introduce a novel low-cost training pipeline for large-scale T2I diffusion models, utilizing random patch masking, deferred masking, and synthetic data to drastically reduce expenses.

Findings

01

Achieved 12.7 FID on COCO with only $1,890 cost.

02

Reduced training cost by 118× compared to stable diffusion.

03

Demonstrated competitive quality with significantly lower resource requirements.

Abstract

As scaling laws in generative AI push performance, they also simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to address this bottleneck by demonstrating very low-cost training of large-scale T2I diffusion transformer models. As the computational cost of transformers increases with the number of patches in each image, we propose to randomly mask up to 75% of the image patches during training. We propose a deferred masking strategy that preprocesses all patches using a patch-mixer before masking, thus significantly reducing the performance degradation with masking, making it superior to model downscaling in reducing computational cost. We also incorporate the latest improvements in transformer architecture, such as the use of mixture-of-experts layers, to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sonyresearch/micro_diffusion
pytorchOfficial

Models

🤗
VSehwag24/MicroDiT
model· ♡ 30
♡ 30

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSports Analytics and Performance

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Attention Dropout · Attention Is All You Need · Dense Connections · Linear Warmup With Cosine Annealing · Multi-Head Attention · Residual Connection · Dropout