Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Hau-Shiang Shiu; Chin-Yang Lin; Zhixiang Wang; Chi-Wei Hsiao; Po-Fan Yu; Yu-Chih Chen; Yu-Lun Liu

arXiv:2512.23709·cs.CV·April 7, 2026

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Hau-Shiang Shiu, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Po-Fan Yu, Yu-Chih Chen, Yu-Lun Liu

PDF

2 Repos 1 Models

TL;DR

Stream-DiffVSR introduces a causally conditioned diffusion framework for low-latency online video super-resolution, significantly reducing delay and improving real-time performance on streaming video tasks.

Contribution

It presents a novel causal diffusion-based VSR method with fast inference and reduced latency, suitable for real-time streaming applications.

Findings

01

Processes 720p frames in 0.328 seconds on RTX 4090.

02

Outperforms prior diffusion-based VSR baselines in perceptual quality.

03

Reduces initial delay from over 4600 seconds to 0.328 seconds.

Abstract

Diffusion-based video super-resolution (VSR) methods deliver strong perceptual quality but are often unsuitable for latency-sensitive scenarios due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, Stream-DiffVSR integrates a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) to enhance detail and temporal coherence. Unlike chunk-wise streaming inference, our strictly frame-by-frame causal design avoids sequence-level waiting, substantially reducing time-to-first-frame and end-to-end latency. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX 4090…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Jamichsu/Stream-DiffVSR
model· 362 dl· ♡ 33
362 dl♡ 33

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.