DiTVR: Zero-Shot Diffusion Transformer for Video Restoration
Sicheng Gao, Nancy Mehta, Zongwei Wu, Radu Timofte

TL;DR
DiTVR is a novel zero-shot video restoration method that combines diffusion transformers with flow-aware attention and wavelet-based sampling to improve temporal consistency and detail preservation without needing paired datasets.
Contribution
The paper introduces DiTVR, a zero-shot framework that integrates trajectory-aware attention and flow-guided sampling, advancing video restoration without reliance on extensive training data.
Findings
Achieves state-of-the-art zero-shot performance on video restoration benchmarks.
Enhances temporal consistency and detail preservation compared to prior methods.
Robust to flow noise and occlusions in various video restoration tasks.
Abstract
Video restoration aims to reconstruct high quality video sequences from low quality inputs, addressing tasks such as super resolution, denoising, and deblurring. Traditional regression based methods often produce unrealistic details and require extensive paired datasets, while recent generative diffusion models face challenges in ensuring temporal consistency. We introduce DiTVR, a zero shot video restoration framework that couples a diffusion transformer with trajectory aware attention and a wavelet guided, flow consistent sampler. Unlike prior 3D convolutional or frame wise diffusion approaches, our attention mechanism aligns tokens along optical flow trajectories, with particular emphasis on vital layers that exhibit the highest sensitivity to temporal dynamics. A spatiotemporal neighbour cache dynamically selects relevant tokens based on motion correspondences across frames. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
