Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
Zihao Wang, Yuzhou Chen, Shaogang Ren

TL;DR
This paper introduces an adaptive diffusion model that dynamically manages domain shifts during cross-modality image translation, improving fidelity and efficiency by embedding domain-shift dynamics into the generative process.
Contribution
It proposes a novel method that predicts spatially varying mixing fields and incorporates explicit restoration terms, enabling on-manifold updates and reducing semantic drift.
Findings
Improves structural fidelity and semantic consistency across tasks.
Reduces the number of denoising steps needed for convergence.
Enhances robustness in medical imaging, remote sensing, and electroluminescence mapping.
Abstract
Cross-modal image translation remains brittle and inefficient. Standard diffusion approaches often rely on a single, global linear transfer between domains. We find that this shortcut forces the sampler to traverse off-manifold, high-cost regions, inflating the correction burden and inviting semantic drift. We refer to this shared failure mode as fixed-schedule domain transfer. In this paper, we embed domain-shift dynamics directly into the generative process. Our model predicts a spatially varying mixing field at every reverse step and injects an explicit, target-consistent restoration term into the drift. This in-step guidance keeps large updates on-manifold and shifts the model's role from global alignment to local residual correction. We provide a continuous-time formulation with an exact solution form and derive a practical first-order sampler that preserves marginal consistency.…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper is well-motivated and conceptually clear. It generalizes traditional linear domain-shift formulations to a nonlinear, manifold-aware framework, supported by solid theoretical analysis and proofs. 2. The method is technically sound and demonstrates strong generality, making it straightforward to apply across different diffusion architectures and cross-modality image translation tasks. 3. Extensive experiments on three benchmarks IXI, Sentinel, and PSCDE show consistent gains in var
1. The method reduces to a linear domain-shift scheme when only a single diffusion step is used, implying that it still depends on multiple denoising steps for stable performance. This reliance could pose an efficiency limitation and hinder direct adaptation to flow-matching or single-step generative methods. 2. While the method shows clear conceptual advances, its quantitative improvements over DOSSR on the Sentinel and IXI datasets are relatively minor, indicating that the advantages may be l
+ Domain adaptation for diffusion models is a growing and challenging topic, and addressing domain shift in generative tasks is an important research direction, especially as diffusion models become widely deployed across diverse domains. + The proposed ALSN approach is computationally efficient and easy to integrate into existing diffusion pipelines, making it potentially attractive for practitioners seeking domain-robust generative models.
- The core idea, i.e., adjusting latent normalization statistics to align source and target distributions, closely resembles well-known techniques in domain adaptation. The paper primarily recontextualizes these ideas within diffusion models without offering substantial theoretical or methodological innovation. This limits the paper’s conceptual contribution. - While results are shown, the paper does not convincingly explain why ALSN improves performance or how it interacts with diffusion timest
– Across three modalities/datasets, the method reports better SSIM/PSNR and strong PSCDE structure metrics. – Concept of putting domain-shift adaptation inside the dynamics (rather than as external guidance) is principled.
– The paper is difficult to follow: many equations and notations are introduced without clear motivation or explanation of each component’s role, making the method hard to reconstruct; it focuses more on how things are done than why they are needed. – The propose solution is fairly incremental since the core contribution is to propose an adaptive interpolation of $\hat{x}^{src}_0$ and $x_0$, replacing time-varying interpolation of source and target [1] – Efficiency claims are supported largely
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
