CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method
Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

TL;DR
CutDiffusion is a tuning-free, efficient method that enhances high-resolution diffusion extrapolation by splitting the process into structure denoising and detail refinement, achieving faster, cheaper, and stronger results.
Contribution
It introduces a simple, tuning-free approach that improves high-resolution diffusion extrapolation by dividing the process into two focused phases, reducing costs and increasing performance.
Findings
Single-step inference speeds up high-resolution diffusion.
Fewer patches needed for structure denoising reduces GPU costs.
Strong generation quality from focused detail refinement.
Abstract
Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapolation but cuts a standard patch diffusion process into an initial phase focused on comprehensive structure denoising and a subsequent phase dedicated to specific detail refinement. Comprehensive experiments highlight the numerous almighty advantages of CutDiffusion: (1) simple method construction that enables a concise higher-resolution diffusion process without third-party engagement; (2) fast inference speed achieved through a single-step higher-resolution diffusion process, and fewer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
