TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion

Nurislam Tursynbek; Zhiqiang Lao; Heather Yu; Gedas Bertasius; Marc Niethammer

arXiv:2605.14136·cs.CV·May 15, 2026

TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion

Nurislam Tursynbek, Zhiqiang Lao, Heather Yu, Gedas Bertasius, Marc Niethammer

PDF

TL;DR

TeDiO is a training-free method that enhances temporal coherence in video diffusion models by regularizing internal attention patterns during inference, resulting in smoother motion without retraining.

Contribution

Introduces TeDiO, a novel inference-time regularization technique that improves temporal consistency in video diffusion models without additional training or external supervision.

Findings

01

TeDiO significantly reduces flickering and unstable motion in generated videos.

02

It maintains high visual quality while improving temporal coherence.

03

Applicable across multiple video diffusion architectures.

Abstract

Recent text-to-video diffusion transformers generate visually compelling frames, yet still struggle with temporal coherence, often producing flickering, drifting, or unstable motion. We show that these failures leave a clear imprint inside the model: incoherent videos consistently exhibit irregular, fragmented temporal diagonals in their intermediate self-attention maps, whereas stable motion corresponds to smooth, band-diagonal patterns. Building on this observation, we introduce TeDiO, a training-free, inference-time method that reinforces temporal consistency by regularizing these internal attention patterns. TeDiO estimates diagonal smoothness, identifies unstable regions, and performs lightweight latent updates that promote coherent frame-to-frame dynamics, without modifying model weights or using external motion supervision. Across multiple video diffusion models (e.g., Wan2.1,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.