RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution

Weisong Zhao; Jingkai Zhou; Xiangyu Zhu; Weihua Chen; Xiao-Yu Zhang; Zhen Lei; Fan Wang

arXiv:2507.19138·eess.IV·July 28, 2025

RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution

Weisong Zhao, Jingkai Zhou, Xiangyu Zhu, Weihua Chen, Xiao-Yu Zhang, Zhen Lei, Fan Wang

PDF

Open Access 1 Datasets

TL;DR

RealisVSR introduces a novel diffusion-based approach with enhanced detail recovery and a new 4K benchmark, significantly improving real-world 4K video super-resolution performance.

Contribution

The paper presents a high-frequency detail-enhanced diffusion model with a new architecture, loss function, and 4K benchmark for improved real-world video super-resolution.

Findings

01

Outperforms existing methods on multiple benchmarks

02

Requires only 5-25% of training data of previous approaches

03

Excels in ultra-high-resolution video super-resolution

Abstract

Video Super-Resolution (VSR) has achieved significant progress through diffusion models, effectively addressing the over-smoothing issues inherent in GAN-based methods. Despite recent advances, three critical challenges persist in VSR community: 1) Inconsistent modeling of temporal dynamics in foundational models; 2) limited high-frequency detail recovery under complex real-world degradations; and 3) insufficient evaluation of detail enhancement and 4K super-resolution, as current methods primarily rely on 720P datasets with inadequate details. To address these challenges, we propose RealisVSR, a high-frequency detail-enhanced video diffusion model with three core innovations: 1) Consistency Preserved ControlNet (CPC) architecture integrated with the Wan2.1 video diffusion to model the smooth and complex motions and suppress artifacts; 2) High-Frequency Rectified Diffusion Loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

WisonZws/RealisVideo-4K
dataset· 72 dl
72 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis