Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong, Zeng, Raymond H. Chan, Jean-michel Morel

TL;DR
This paper introduces a novel vectorized timestep approach for video diffusion models, enabling each frame to have an independent noise schedule, which improves the modeling of complex temporal dependencies in video generation tasks.
Contribution
The paper proposes a frame-aware video diffusion model with a vectorized timestep variable, enhancing temporal modeling and outperforming existing methods across multiple video generation tasks.
Findings
Outperforms state-of-the-art in video quality
Handles long video synthesis effectively
Overcomes catastrophic forgetting during fine-tuning
Abstract
Diffusion models have revolutionized image generation, and their extension to video generation has shown promise. However, current video diffusion models~(VDMs) rely on a scalar timestep variable applied at the clip level, which limits their ability to model complex temporal dependencies needed for various tasks like image-to-video generation. To address this limitation, we propose a frame-aware video diffusion model~(FVDM), which introduces a novel vectorized timestep variable~(VTV). Unlike conventional VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Video Coding and Compression Technologies
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · Diffusion
