C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Yuhao Li; Mirana Claire Angel; Salman Khan; Yu Zhu; Jinqiu; Sun; Yanning Zhang; Fahad Shahbaz Khan

arXiv:2502.19868·cs.CV·February 28, 2025

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Yuhao Li, Mirana Claire Angel, Salman Khan, Yu Zhu, Jinqiu, Sun, Yanning Zhang, Fahad Shahbaz Khan

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

C-Drag introduces a chain-of-thought reasoning framework for motion control in video generation, explicitly modeling object interactions and dynamics, leading to more accurate and controllable video synthesis.

Contribution

The paper proposes a novel chain-of-thought-based motion controller that incorporates object perception and dynamic interaction reasoning for improved controllable video generation.

Findings

01

C-Drag outperforms existing methods in object motion control metrics.

02

The new VOI dataset enables comprehensive evaluation of interaction-aware video generation.

03

Experimental results demonstrate effective modeling of object interactions and dynamics.

Abstract

Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. However, the existing trajectory-based approaches are usually limited to only generating the motion trajectory of the controlled object and ignoring the dynamic interactions between the controlled object and its surroundings. To address this limitation, we propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Instead of directly generating the motion of some objects, our C-Drag first performs object perception and then reasons the dynamic interactions between different objects according to the given motion control of the objects. Specifically, our method includes an object perception module and a Chain-of-Thought-based motion reasoning module. The object perception module employs visual language models to capture the position…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weslee88524/c-drag-official-repo
noneOfficial

Models

🤗
UHow/C-Drag
model· ♡ 1
♡ 1

Datasets

UHow/VOI
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsDiffusion