Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep   Approach

Yaofang Liu; Yumeng Ren; Xiaodong Cun; Aitor Artola; Yang Liu; Tieyong; Zeng; Raymond H. Chan; Jean-michel Morel

arXiv:2410.03160·cs.CV·October 7, 2024

Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach

Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong, Zeng, Raymond H. Chan, Jean-michel Morel

PDF

Open Access 1 Repo 3 Models 2 Datasets

TL;DR

This paper introduces a novel vectorized timestep approach for video diffusion models, enabling each frame to have an independent noise schedule, which improves the modeling of complex temporal dependencies in video generation tasks.

Contribution

The paper proposes a frame-aware video diffusion model with a vectorized timestep variable, enhancing temporal modeling and outperforming existing methods across multiple video generation tasks.

Findings

01

Outperforms state-of-the-art in video quality

02

Handles long video synthesis effectively

03

Overcomes catastrophic forgetting during fine-tuning

Abstract

Diffusion models have revolutionized image generation, and their extension to video generation has shown promise. However, current video diffusion models~(VDMs) rely on a scalar timestep variable applied at the clip level, which limits their ability to model complex temporal dependencies needed for various tasks like image-to-video generation. To address this limitation, we propose a frame-aware video diffusion model~(FVDM), which introduces a novel vectorized timestep variable~(VTV). Unlike conventional VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaofang-liu/fvdm
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Video Coding and Compression Technologies

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · Diffusion