MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

Xinyao Liao; Xianfang Zeng; Liao Wang; Gang Yu; Guosheng Lin; Chi Zhang

arXiv:2502.03207·cs.CV·October 16, 2025

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

Xinyao Liao, Xianfang Zeng, Liao Wang, Gang Yu, Guosheng Lin, Chi Zhang

PDF

Open Access 1 Models

TL;DR

MotionAgent introduces a novel method for fine-grained, text-guided video generation by converting textual motion descriptions into explicit motion fields, enabling precise control over object and camera movements.

Contribution

The paper presents the motion field agent that explicitly models object and camera motion from text prompts, improving control accuracy in video generation.

Findings

01

Achieves significant improvement in Video-Text Camera Motion metrics.

02

Outperforms existing models in motion generation accuracy.

03

Effectively integrates motion representations in 3D space.

Abstract

We propose MotionAgent, enabling fine-grained motion control for text-guided image-to-video generation. The key technique is the motion field agent that converts motion information in text prompts into explicit motion fields, providing flexible and precise motion guidance. Specifically, the agent extracts the object movement and camera motion described in the text and converts them into object trajectories and camera extrinsics, respectively. An analytical optical flow composition module integrates these motion representations in 3D space and projects them into a unified optical flow. An optical flow adapter takes the flow to control the base image-to-video diffusion model for generating fine-grained controlled videos. The significant improvement in the Video-Text Camera Motion metrics on VBench indicates that our method achieves precise control over camera motion. We construct a subset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
leoisufa/MotionAgent
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Face recognition and analysis

MethodsDiffusion · Balanced Selection · Adapter