DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution

Zheng Chen; Ruofan Yang; Jin Han; Dehua Song; Zichen Zou; Chunming He; Yong Guo; Yulun Zhang

arXiv:2605.13182·cs.CV·May 14, 2026

DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution

Zheng Chen, Ruofan Yang, Jin Han, Dehua Song, Zichen Zou, Chunming He, Yong Guo, Yulun Zhang

PDF

1 Repo

TL;DR

DiffST introduces an efficient spatiotemporal-aware diffusion framework for real-world space-time video super-resolution, significantly improving inference speed and spatiotemporal information utilization.

Contribution

The paper proposes DiffST, a novel diffusion-based model with one-step sampling and cross-frame context aggregation for enhanced efficiency and spatiotemporal modeling in STVSR.

Findings

01

DiffST achieves state-of-the-art results on real-world STVSR tasks.

02

It runs about 17 times faster than previous diffusion-based methods.

03

Extensive experiments validate the effectiveness of CFCA and VRG modules.

Abstract

Diffusion-based models have shown strong performance in video super-resolution (VSR) and video frame interpolation (VFI). However, their role in the coupled space-time video super-resolution (STVSR) setting remains limited. Existing diffusion-based STVSR approaches suffer from two issues: (1) low inference efficiency and (2) insufficient utilization of spatiotemporal information. These limitations impede deployment. To address these issues, we introduce DiffST, an efficient spatiotemporal-aware video diffusion framework for real-world STVSR. To improve efficiency, we adapt a pre-trained diffusion model for one-step sampling and process the entire video directly rather than operating on individual frames. Furthermore, to enhance spatiotemporal information utilization, we introduce cross-frame context aggregation (CFCA) and video representation guidance (VRG). The CFCA module aggregates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhengchen1999/DiffST
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.