SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution

Liangbin Xie; Yu Li; Shian Du; Menghan Xia; Xintao Wang; Fanghua Yu; Ziyan Chen; Pengfei Wan; Jiantao Zhou; Chao Dong

arXiv:2506.19838·cs.CV·September 30, 2025

SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution

Liangbin Xie, Yu Li, Shian Du, Menghan Xia, Xintao Wang, Fanghua Yu, Ziyan Chen, Pengfei Wan, Jiantao Zhou, Chao Dong

PDF

Open Access

TL;DR

This paper introduces SimpleGVR, a lightweight cascaded video super-resolution framework that improves efficiency and output quality by studying key design principles, innovative training strategies, and architectural enhancements.

Contribution

It proposes novel degradation strategies, analyzes VSR model behaviors, and introduces interleaving temporal units with sparse local attention for efficient high-resolution video synthesis.

Findings

01

Outperforms existing methods in quality and efficiency

02

Training with better-mimicked degradation improves results

03

Architectural innovations reduce computational overhead

Abstract

Latent diffusion models have emerged as a leading paradigm for efficient video generation. However, as user expectations shift toward higher-resolution outputs, relying solely on latent computation becomes inadequate. A promising approach involves decoupling the process into two stages: semantic content generation and detail synthesis. The former employs a computationally intensive base model at lower resolutions, while the latter leverages a lightweight cascaded video super-resolution (VSR) model to achieve high-resolution output. In this work, we focus on studying key design principles for latter cascaded VSR models, which are underexplored currently. First, we propose two degradation strategies to generate training pairs that better mimic the output characteristics of the base model, ensuring alignment between the VSR model and its upstream generator. Second, we provide critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment