InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution
Jintong Hu, Bin Chen, Zhenyu Hu, Jiayue Liu, Guo Wang, Lu Qi

TL;DR
InstaVSR introduces an efficient, lightweight diffusion framework for video super-resolution that balances high perceptual quality with temporal stability and low computational cost.
Contribution
The paper presents a novel, simplified diffusion-based VSR method with recurrent training and dual-space adversarial learning for improved efficiency and stability.
Findings
Processes 30-frame 2K video in under one minute on an RTX 4090.
Reduces computational cost compared to existing diffusion VSR methods.
Maintains perceptual quality with smoother temporal transitions.
Abstract
Video super-resolution (VSR) seeks to reconstruct high-resolution frames from low-resolution inputs. While diffusion-based methods have substantially improved perceptual quality, extending them to video remains challenging for two reasons: strong generative priors can introduce temporal instability, and multi-frame diffusion pipelines are often too expensive for practical deployment. To address both challenges simultaneously, we propose InstaVSR, a lightweight diffusion framework for efficient video super-resolution. InstaVSR combines three ingredients: (1) a pruned one-step diffusion backbone that removes several costly components from conventional diffusion-based VSR pipelines, (2) recurrent training with flow-guided temporal regularization to improve frame-to-frame stability, and (3) dual-space adversarial learning in latent and pixel spaces to preserve perceptual quality after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
