TL;DR
Stream-DiffVSR introduces a causally conditioned diffusion framework for low-latency online video super-resolution, significantly reducing delay and improving real-time performance on streaming video tasks.
Contribution
It presents a novel causal diffusion-based VSR method with fast inference and reduced latency, suitable for real-time streaming applications.
Findings
Processes 720p frames in 0.328 seconds on RTX 4090.
Outperforms prior diffusion-based VSR baselines in perceptual quality.
Reduces initial delay from over 4600 seconds to 0.328 seconds.
Abstract
Diffusion-based video super-resolution (VSR) methods deliver strong perceptual quality but are often unsuitable for latency-sensitive scenarios due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, Stream-DiffVSR integrates a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) to enhance detail and temporal coherence. Unlike chunk-wise streaming inference, our strictly frame-by-frame causal design avoids sequence-level waiting, substantially reducing time-to-first-frame and end-to-end latency. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX 4090…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
