InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution

Jintong Hu; Bin Chen; Zhenyu Hu; Jiayue Liu; Guo Wang; Lu Qi

arXiv:2603.26134·cs.CV·March 30, 2026

InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution

Jintong Hu, Bin Chen, Zhenyu Hu, Jiayue Liu, Guo Wang, Lu Qi

PDF

TL;DR

InstaVSR introduces an efficient, lightweight diffusion framework for video super-resolution that balances high perceptual quality with temporal stability and low computational cost.

Contribution

The paper presents a novel, simplified diffusion-based VSR method with recurrent training and dual-space adversarial learning for improved efficiency and stability.

Findings

01

Processes 30-frame 2K video in under one minute on an RTX 4090.

02

Reduces computational cost compared to existing diffusion VSR methods.

03

Maintains perceptual quality with smoother temporal transitions.

Abstract

Video super-resolution (VSR) seeks to reconstruct high-resolution frames from low-resolution inputs. While diffusion-based methods have substantially improved perceptual quality, extending them to video remains challenging for two reasons: strong generative priors can introduce temporal instability, and multi-frame diffusion pipelines are often too expensive for practical deployment. To address both challenges simultaneously, we propose InstaVSR, a lightweight diffusion framework for efficient video super-resolution. InstaVSR combines three ingredients: (1) a pruned one-step diffusion backbone that removes several costly components from conventional diffusion-based VSR pipelines, (2) recurrent training with flow-guided temporal regularization to improve frame-to-frame stability, and (3) dual-space adversarial learning in latent and pixel spaces to preserve perceptual quality after…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.