DiTVR: Zero-Shot Diffusion Transformer for Video Restoration

Sicheng Gao; Nancy Mehta; Zongwei Wu; Radu Timofte

arXiv:2508.07811·cs.CV·August 12, 2025

DiTVR: Zero-Shot Diffusion Transformer for Video Restoration

Sicheng Gao, Nancy Mehta, Zongwei Wu, Radu Timofte

PDF

Open Access

TL;DR

DiTVR is a novel zero-shot video restoration method that combines diffusion transformers with flow-aware attention and wavelet-based sampling to improve temporal consistency and detail preservation without needing paired datasets.

Contribution

The paper introduces DiTVR, a zero-shot framework that integrates trajectory-aware attention and flow-guided sampling, advancing video restoration without reliance on extensive training data.

Findings

01

Achieves state-of-the-art zero-shot performance on video restoration benchmarks.

02

Enhances temporal consistency and detail preservation compared to prior methods.

03

Robust to flow noise and occlusions in various video restoration tasks.

Abstract

Video restoration aims to reconstruct high quality video sequences from low quality inputs, addressing tasks such as super resolution, denoising, and deblurring. Traditional regression based methods often produce unrealistic details and require extensive paired datasets, while recent generative diffusion models face challenges in ensuring temporal consistency. We introduce DiTVR, a zero shot video restoration framework that couples a diffusion transformer with trajectory aware attention and a wavelet guided, flow consistent sampler. Unlike prior 3D convolutional or frame wise diffusion approaches, our attention mechanism aligns tokens along optical flow trajectories, with particular emphasis on vital layers that exhibit the highest sensitivity to temporal dynamics. A spatiotemporal neighbour cache dynamically selects relevant tokens based on motion correspondences across frames. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging