Learning to Balance: Decoupled Siamese Diffusion Transformer for Reference-Based Remote Sensing Image Super-Resolution

Bin Luo; Runmin Dong; Zhaoyang Luo; Jinxiao Zhang; Jiyao Zhao; Fan Wei; and Haohuan Fu

arXiv:2605.17980·cs.CV·May 19, 2026

Learning to Balance: Decoupled Siamese Diffusion Transformer for Reference-Based Remote Sensing Image Super-Resolution

Bin Luo, Runmin Dong, Zhaoyang Luo, Jinxiao Zhang, Jiyao Zhao, Fan Wei, and Haohuan Fu

PDF

TL;DR

This paper introduces DS-DiT, a novel decoupled Siamese diffusion transformer for reference-based remote sensing image super-resolution, effectively balancing reference information utilization and detail recovery.

Contribution

The paper proposes a decoupled attention mechanism and a patch-level weighting module to improve super-resolution quality in remote sensing images.

Findings

01

DS-DiT outperforms existing methods in quantitative metrics.

02

The approach enhances visual fidelity of super-resolved images.

03

The method effectively balances reference reliance and detail recovery.

Abstract

Diffusion-based methods demonstrate significant potential for remote sensing image super-resolution at large scaling factors, particularly in reference-based super-resolution (RefSR) where high-resolution reference images provide critical fine-grained texture priors. However, existing methods often suffer from a trade-off between over-reliance on reference information, which leads to texture artifacts, and underutilization, which results in insufficient detail recovery. To address these issues, we propose DS-DiT, a Decoupled Siamese Diffusion Transformer method that decouples low-resolution and reference interactions at the attention level. By enabling low-resolution structural priors and reference texture information to interact independently with the noisy latent, the framework effectively mitigates inter-source competition. Furthermore, to compensate for the limited local modeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.