TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Haitao Zhou, Chuang Wang, Rui Nie, Jinlin Liu, Dongdong Yu, Qian Yu,, Changhu Wang

TL;DR
TrackGo introduces a flexible, efficient method for controllable video generation using free-form masks and arrows, with a lightweight adapter that enhances control precision and achieves state-of-the-art results.
Contribution
The paper presents TrackGo, a novel controllable video generation framework utilizing free-form masks, arrows, and the TrackAdapter for improved control and efficiency.
Findings
Achieves state-of-the-art FVD, FID, and ObjMC scores.
Provides precise control over complex video scenarios.
Introduces a lightweight, seamless control adapter.
Abstract
Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce TrackGo, a novel approach that leverages free-form masks and arrows for conditional video generation. This method offers users with a flexible and precise mechanism for manipulating video content. We also propose the TrackAdapter for control implementation, an efficient and lightweight adapter designed to be seamlessly integrated into the temporal self-attention layers of a pretrained video generation model. This design leverages our observation that the attention map of these layers can accurately activate regions corresponding to motion in videos. Our experimental results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · Adapter
