Track4Gen: Teaching Video Diffusion Models to Track Points Improves   Video Generation

Hyeonho Jeong; Chun-Hao Paul Huang; Jong Chul Ye; Niloy Mitra; Duygu; Ceylan

arXiv:2412.06016·cs.CV·April 8, 2025

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao Paul Huang, Jong Chul Ye, Niloy Mitra, Duygu, Ceylan

PDF

Open Access

TL;DR

Track4Gen introduces a novel method that integrates point tracking into video diffusion models, significantly reducing appearance drift and enhancing temporal coherence in generated videos.

Contribution

It is the first to unify video generation and point tracking in a single model, improving spatial supervision and visual consistency in generated videos.

Findings

01

Reduces appearance drift in generated videos

02

Enhances temporal stability and visual coherence

03

Unifies video generation and point tracking tasks

Abstract

While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this is because there is no explicit supervision in terms of spatial tracking at the feature level. We propose Track4Gen, a spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. Track4Gen merges the video generation and point tracking tasks into a single network by making minimal changes to existing video generation architectures. Using Stable Video Diffusion as a backbone, Track4Gen demonstrates that it is possible to unify video generation and point tracking, which are typically handled as separate tasks. Our extensive evaluations show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Video Analysis and Summarization · Human Motion and Animation

MethodsAttentive Walk-Aggregating Graph Neural Network · Diffusion