Improving Consistency in Diffusion Models for Image Super-Resolution

Junhao Gu; Peng-Tao Jiang; Hao Zhang; Mi Zhou; Jinwei Chen; Wenming; Yang; Bo Li

arXiv:2410.13807·cs.CV·April 28, 2025

Improving Consistency in Diffusion Models for Image Super-Resolution

Junhao Gu, Peng-Tao Jiang, Hao Zhang, Mi Zhou, Jinwei Chen, Wenming, Yang, Bo Li

PDF

Open Access

TL;DR

This paper introduces ConsisSR, a novel diffusion-based super-resolution method that improves semantic and training-inference consistency using a hybrid prompt adapter and time-aware latent augmentation, achieving state-of-the-art results.

Contribution

The paper proposes a new framework, ConsisSR, with a hybrid prompt adapter and time-aware latent augmentation to address semantic and training-inference inconsistencies in diffusion-based image super-resolution.

Findings

01

Achieves state-of-the-art performance among diffusion models.

02

Effectively reduces semantic inconsistency with CLIP image embeddings.

03

Bridges training-inference gap with TALA, improving reconstruction quality.

Abstract

Recent methods exploit the powerful text-to-image (T2I) diffusion models for real-world image super-resolution (Real-ISR) and achieve impressive results compared to previous models. However, we observe two kinds of inconsistencies in diffusion-based methods which hinder existing models from fully exploiting diffusion priors. The first is the semantic inconsistency arising from diffusion guidance. T2I generation focuses on semantic-level consistency with text prompts, while Real-ISR emphasizes pixel-level reconstruction from low-quality (LQ) images, necessitating more detailed semantic guidance from LQ inputs. The second is the training-inference inconsistency stemming from the DDPM, which improperly assumes high-quality (HQ) latent corrupted by Gaussian noise as denoising inputs for each timestep. To address these issues, we introduce ConsisSR to handle both semantic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Photoacoustic and Ultrasonic Imaging

MethodsDiffusion · Contrastive Language-Image Pre-training · Adapter