Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
Yuan Wang, Yuhao Wan, Siming Zheng, Bo Li, Qibin Hou, Peng-Tao Jiang

TL;DR
Ada-RefSR introduces an adaptive diffusion framework that intelligently leverages reference images for super-resolution, balancing guidance and suppression based on reliability, thus improving robustness and quality in real-world scenarios.
Contribution
The paper proposes Ada-RefSR with AICG, a novel adaptive implicit correlation gating mechanism that dynamically controls reference guidance in diffusion-based super-resolution.
Findings
Achieves a strong balance of fidelity, naturalness, and efficiency.
Robust under varying reference alignment conditions.
Outperforms existing methods on multiple datasets.
Abstract
Recent works have explored reference-based super-resolution (RefSR) to mitigate hallucinations in diffusion-based image restoration. A key challenge is that real-world degradations make correspondences between low-quality (LQ) inputs and reference (Ref) images unreliable, requiring adaptive control of reference usage. Existing methods either ignore LQ-Ref correlations or rely on brittle explicit matching, leading to over-reliance on misleading references or under-utilization of valuable cues. To address this, we propose Ada-RefSR, a single-step diffusion framework guided by a "Trust but Verify" principle: reference information is leveraged when reliable and suppressed otherwise. Its core component, Adaptive Implicit Correlation Gating (AICG), employs learnable summary tokens to distill dominant reference patterns and capture implicit correlations with LQ features. Integrated into the…
Peer Reviews
Decision·ICLR 2026 Poster
1. The proposed “Trust but Verify” perspective is an interesting and promising approach. 2. The authors conduct extensive experiments to validate the effectiveness of their method.
1. The key innovation of this paper is the Adaptive Implicit Correlation Gating (AICG) mechanism. Currently, the introduction lacks a critical figure illustrating the authors’ approach and the core differences compared with prior methods, which is highly important. 2. The authors need to explain why their method does not perform well on the WRSR dataset.
- The paper clearly defines the problem of over- or under-reliance on reference images in diffusion-based SR and provides a coherent conceptual framework to address it. - The proposed AICG module is lightweight and easily pluggable into existing backbones, enabling adaptive control of reference guidance without additional supervision. - This paper provides clear visual evidence, such as attention maps, gating masks, and token visualizations, that help interpret the mechanism’s behavior.
- The main limitation lies in the novelty boundary of AICG. Its design using learnable tokens for implicit correlation modeling is conceptually close to DETR-style query or prototype aggregation, and the paper does not clearly explain how it fundamentally differs from those approaches. - The paper also lacks comparisons with strong face-specific RefSR methods, which would help validate the generalization of Ada-RefSR in specialized domains. - There is no detailed analysis of different refere
1. The paper proposes AICG designed for RefSR task. Ada-RefSR injects LQ features directly using a residual connection besides the reference feature selection. It allows the model to preserve the prior knowledge. Also, a gating mechanism is applied in the reference feature attention components to adaptively select useful information from the reference image. 2. The model achieves SOTA performance with efficient one-step diffusion model. 3. The idea is straightforward and easy to follow.
1. The novelty is limited, since most of the model design and idea like LQ feature residual connection is straightforward and has been proposed in classic non-diffusion methods. The paper should discuss the novelty specifically for Ref-SR task. 2. Though claiming efficiency as one of the contributions, the paper provides insufficient discussions and experiments on efficiency and model size. 3. The visualizations are mostly derived from animal images, are there results and visual comparisons on m
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Video Quality Assessment · Advanced Image Fusion Techniques
