Characterizing Motion Encoding in Video Diffusion Timesteps

Vatsal Baherwani; Yixuan Ren; Abhinav Shrivastava

arXiv:2512.22175·cs.CV·December 30, 2025

Characterizing Motion Encoding in Video Diffusion Timesteps

Vatsal Baherwani, Yixuan Ren, Abhinav Shrivastava

PDF

Open Access

TL;DR

This paper systematically characterizes how motion and appearance are encoded across timesteps in video diffusion models, revealing an early motion-dominant and a later appearance-dominant regime, and simplifies motion transfer by focusing on the motion-dominant phase.

Contribution

It introduces a quantitative protocol to map motion and appearance trade-offs across diffusion timesteps and proposes a simplified motion transfer method based on this characterization.

Findings

01

Identifies an early, motion-dominant regime in diffusion timesteps.

02

Establishes a late, appearance-dominant regime in diffusion timesteps.

03

Enables strong motion transfer without auxiliary modules or specialized objectives.

Abstract

Text-to-video diffusion models synthesize temporal motion and spatial appearance through iterative denoising, yet how motion is encoded across timesteps remains poorly understood. Practitioners often exploit the empirical heuristic that early timesteps mainly shape motion and layout while later ones refine appearance, but this behavior has not been systematically characterized. In this work, we proxy motion encoding in video diffusion timesteps by the trade-off between appearance editing and motion preservation induced when injecting new conditions over specified timestep ranges, and characterize this proxy through a large-scale quantitative study. This protocol allows us to factor motion from appearance by quantitatively mapping how they compete along the denoising trajectory. Across diverse architectures, we consistently identify an early, motion-dominant regime and a later,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computer Graphics and Visualization Techniques