TL;DR
FRAMER is a novel training scheme for real-world image super-resolution that leverages diffusion priors and frequency-aware distillation to enhance high-frequency detail reconstruction without altering the model architecture.
Contribution
It introduces a frequency-aligned self-distillation method with adaptive modulation, improving super-resolution quality by decomposing features into LF/HF bands and applying contrastive losses.
Findings
Consistently improves PSNR/SSIM across multiple backbones.
Enhances perceptual quality metrics like LPIPS and NIQE.
Validates effectiveness through ablation studies.
Abstract
Real-image super-resolution (Real-ISR) seeks to recover HR images from LR inputs with mixed, unknown degradations. While diffusion models surpass GANs in perceptual quality, they under-reconstruct high-frequency (HF) details due to a low-frequency (LF) bias and a depth-wise "low-first, high-later" hierarchy. We introduce FRAMER, a plug-and-play training scheme that exploits diffusion priors without changing the backbone or inference. At each denoising step, the final-layer feature map teaches all intermediate layers. Teacher and student feature maps are decomposed into LF/HF bands via FFT masks to align supervision with the model's internal frequency hierarchy. For LF, an Intra Contrastive Loss (IntraCL) stabilizes globally shared structure. For HF, an Inter Contrastive Loss (InterCL) sharpens instance-specific details using random-layer and in-batch negatives. Two adaptive modulators,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
