Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features
Jingyi Xu, Meisong Zheng, Ying Chen, Minglang Qiao, Xin Deng, Mai Xu

TL;DR
This paper introduces DGAF-VSR, a novel diffusion model-based video super-resolution method that leverages dense feature guidance and aligned features to improve perceptual quality, fidelity, and temporal consistency.
Contribution
It proposes a new framework with an optical guided warping module and a feature-wise temporal condition module to better align and compensate video frames in the feature domain.
Findings
DGAF-VSR achieves 35.82% DISTS reduction, indicating improved perceptual quality.
It attains 0.20 dB PSNR gain, showing enhanced fidelity.
The method reduces tLPIPS by 30.37%, improving temporal consistency.
Abstract
Diffusion model (DM) based Video Super-Resolution (VSR) approaches achieve impressive perceptual quality. However, they suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity, primarily caused by inaccurate alignment and insufficient compensation between video frames. In this paper, within the DM-based VSR pipeline, we revisit the role of alignment and compensation between adjacent video frames and reveal two crucial observations: (a) the feature domain is better suited than the pixel domain for information compensation due to its stronger spatial and temporal correlations, and (b) warping at an upscaled resolution better preserves high-frequency information, but this benefit is not necessarily monotonic. Therefore, we propose a novel Densely Guided diffusion model with Aligned Features for Video Super-Resolution (DGAF-VSR), with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
