SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning

Chen Chen; Majid Abdolshah; Violetta Shevchenko; Hongdong Li; Chang Xu; Pulak Purkait

arXiv:2510.22534·cs.CV·October 28, 2025

SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning

Chen Chen, Majid Abdolshah, Violetta Shevchenko, Hongdong Li, Chang Xu, Pulak Purkait

PDF

TL;DR

This paper introduces SRSR, a novel super-resolution framework that improves semantic accuracy and reduces hallucinations in image super-resolution by refining text conditioning with spatial guidance and targeted guidance mechanisms.

Contribution

The paper proposes a new plug-and-play framework with Spatially Re-focused Cross-Attention and Spatially Targeted Classifier-Free Guidance to enhance semantic fidelity in diffusion-based super-resolution.

Findings

01

Outperforms seven state-of-the-art methods in PSNR and SSIM.

02

Achieves higher perceptual quality on real-world datasets.

03

Effectively reduces semantic misalignment and hallucinations.

Abstract

Existing diffusion-based super-resolution approaches often exhibit semantic ambiguities due to inaccuracies and incompleteness in their text conditioning, coupled with the inherent tendency for cross-attention to divert towards irrelevant pixels. These limitations can lead to semantic misalignment and hallucinated details in the generated high-resolution outputs. To address these, we propose a novel, plug-and-play spatially re-focused super-resolution (SRSR) framework that consists of two core components: first, we introduce Spatially Re-focused Cross-Attention (SRCA), which refines text conditioning at inference time by applying visually-grounded segmentation masks to guide cross-attention. Second, we introduce a Spatially Targeted Classifier-Free Guidance (STCFG) mechanism that selectively bypasses text influences on ungrounded pixels to prevent hallucinations. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.