Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Hyeonho Jeong, Chun-Hao Paul Huang, Jong Chul Ye, Niloy Mitra, Duygu, Ceylan

TL;DR
Track4Gen introduces a novel method that integrates point tracking into video diffusion models, significantly reducing appearance drift and enhancing temporal coherence in generated videos.
Contribution
It is the first to unify video generation and point tracking in a single model, improving spatial supervision and visual consistency in generated videos.
Findings
Reduces appearance drift in generated videos
Enhances temporal stability and visual coherence
Unifies video generation and point tracking tasks
Abstract
While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this is because there is no explicit supervision in terms of spatial tracking at the feature level. We propose Track4Gen, a spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. Track4Gen merges the video generation and point tracking tasks into a single network by making minimal changes to existing video generation architectures. Using Stable Video Diffusion as a backbone, Track4Gen demonstrates that it is possible to unify video generation and point tracking, which are typically handled as separate tasks. Our extensive evaluations show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Video Analysis and Summarization · Human Motion and Animation
MethodsAttentive Walk-Aggregating Graph Neural Network · Diffusion
