C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
Yuhao Li, Mirana Claire Angel, Salman Khan, Yu Zhu, Jinqiu, Sun, Yanning Zhang, Fahad Shahbaz Khan

TL;DR
C-Drag introduces a chain-of-thought reasoning framework for motion control in video generation, explicitly modeling object interactions and dynamics, leading to more accurate and controllable video synthesis.
Contribution
The paper proposes a novel chain-of-thought-based motion controller that incorporates object perception and dynamic interaction reasoning for improved controllable video generation.
Findings
C-Drag outperforms existing methods in object motion control metrics.
The new VOI dataset enables comprehensive evaluation of interaction-aware video generation.
Experimental results demonstrate effective modeling of object interactions and dynamics.
Abstract
Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. However, the existing trajectory-based approaches are usually limited to only generating the motion trajectory of the controlled object and ignoring the dynamic interactions between the controlled object and its surroundings. To address this limitation, we propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Instead of directly generating the motion of some objects, our C-Drag first performs object perception and then reasons the dynamic interactions between different objects according to the given motion control of the objects. Specifically, our method includes an object perception module and a Chain-of-Thought-based motion reasoning module. The object perception module employs visual language models to capture the position…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsDiffusion
