Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor, Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew, Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna, Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston

TL;DR
This paper introduces a simple greedy growing method for training high-resolution pixel-based diffusion models, eliminating the need for cascaded super-resolution components and enabling stable, large-scale image generation.
Contribution
The authors propose a novel greedy architecture growth algorithm that stabilizes training and scales diffusion models to high resolutions without cascades or additional regularization.
Findings
Able to train models up to 8B parameters without extra regularization
Achieved high-resolution 1024x1024 image generation with superior human preference
Eliminated the need for cascaded super-resolution in diffusion models
Abstract
We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignment {\it vs.} high-resolution rendering. We first demonstrate the benefits of scaling a {\it Shallow UNet}, with no down(up)-sampling enc(dec)oder. Scaling its deep core layers is shown to improve alignment, object structure, and composition. Building on this core model, we propose a greedy algorithm that grows the architecture into high-resolution end-to-end models, while preserving the integrity of the pre-trained representation, stabilizing training, and reducing the need for large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications
MethodsDiffusion
