ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
Zongsheng Yue, Jianyi Wang, Chen Change Loy

TL;DR
ResShift introduces an efficient diffusion-based image super-resolution method that reduces sampling steps to 15, maintaining high quality and outperforming existing techniques in speed and performance.
Contribution
The paper proposes a novel residual shifting diffusion model that significantly accelerates image super-resolution while preserving or improving quality compared to prior methods.
Findings
Achieves comparable or superior results with only 15 sampling steps.
Outperforms existing methods on synthetic and real-world datasets.
Eliminates the need for post-acceleration, reducing performance loss.
Abstract
Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Sparse and Compressive Sensing Techniques
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
