TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion
Nurislam Tursynbek, Zhiqiang Lao, Heather Yu, Gedas Bertasius, Marc Niethammer

TL;DR
TeDiO is a training-free method that enhances temporal coherence in video diffusion models by regularizing internal attention patterns during inference, resulting in smoother motion without retraining.
Contribution
Introduces TeDiO, a novel inference-time regularization technique that improves temporal consistency in video diffusion models without additional training or external supervision.
Findings
TeDiO significantly reduces flickering and unstable motion in generated videos.
It maintains high visual quality while improving temporal coherence.
Applicable across multiple video diffusion architectures.
Abstract
Recent text-to-video diffusion transformers generate visually compelling frames, yet still struggle with temporal coherence, often producing flickering, drifting, or unstable motion. We show that these failures leave a clear imprint inside the model: incoherent videos consistently exhibit irregular, fragmented temporal diagonals in their intermediate self-attention maps, whereas stable motion corresponds to smooth, band-diagonal patterns. Building on this observation, we introduce TeDiO, a training-free, inference-time method that reinforces temporal consistency by regularizing these internal attention patterns. TeDiO estimates diagonal smoothness, identifies unstable regions, and performs lightweight latent updates that promote coherent frame-to-frame dynamics, without modifying model weights or using external motion supervision. Across multiple video diffusion models (e.g., Wan2.1,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
