Make a Cheap Scaling: A Self-Cascade Diffusion Model for   Higher-Resolution Adaptation

Lanqing Guo; Yingqing He; Haoxin Chen; Menghan Xia; Xiaodong Cun,; Yufei Wang; Siyu Huang; Yong Zhang; Xintao Wang; Qifeng Chen; Ying Shan,; Bihan Wen

arXiv:2402.10491·cs.CV·September 23, 2024·1 cites

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun,, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan,, Bihan Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-cascade diffusion model that efficiently adapts pre-trained low-resolution models to generate higher-resolution images and videos, significantly reducing training time and resource requirements.

Contribution

The proposed model enables rapid, tuning-free adaptation to higher resolutions using knowledge transfer and multi-scale upsamplers, with minimal additional training.

Findings

01

Achieves 5x faster training speed compared to full fine-tuning

02

Requires only 0.002M tuning parameters

03

Can adapt to higher resolutions with just 10k training steps

Abstract

Diffusion models have proven to be highly effective in image and video generation; however, they encounter challenges in the correct composition of objects when generating images of varying sizes due to single-scale training data. Adapting large pre-trained diffusion models to higher resolution demands substantial computational and optimization resources, yet achieving generation capabilities comparable to low-resolution models remains challenging. This paper proposes a novel self-cascade diffusion model that leverages the knowledge gained from a well-trained low-resolution image/video generation model, enabling rapid adaptation to higher-resolution generation. Building on this, we employ the pivot replacement strategy to facilitate a tuning-free version by progressively leveraging reliable semantic guidance derived from the low-resolution model. We further propose to integrate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guolanqing/self-cascade
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWind and Air Flow Studies

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings