MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Lee Hsin-Ying; Hanwen Jiang; Yiqun Mei; Jing Shi; Ming-Hsuan Yang; Zhixin Shu

arXiv:2605.22818·cs.CV·May 22, 2026

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Lee Hsin-Ying, Hanwen Jiang, Yiqun Mei, Jing Shi, Ming-Hsuan Yang, Zhixin Shu

PDF

1 Repo 1 Datasets

TL;DR

MotiMotion introduces a reasoning-based framework for motion-controlled video generation that enhances plausibility and interaction realism by refining trajectories and hallucinating secondary motions.

Contribution

The paper presents a novel reasoning-then-generation approach, a confidence-aware control scheme, and a new benchmark for more realistic motion-controlled video synthesis.

Findings

01

Produces more plausible object behaviors and interactions.

02

Outperforms existing methods in human evaluations.

03

Demonstrates effectiveness on the new MotiBench dataset.

Abstract

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we introduce MotiMotion, a novel framework that reformulates motion control as a reasoning-then-generation problem. To encourage causally grounded and commonsense-consistent interactions, we leverage a training-free vision-language reasoner to refine image-space coordinates of primary trajectories and to hallucinate plausible secondary motions. To further improve motion naturalness, we propose a confidence-aware control scheme that modulates guidance strength, enabling the model to closely follow high-confidence plans while correcting artifacts under low-confidence inputs with its internal generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

motimotion/motimotion
github

Datasets

shinying/motibench
dataset· 128 dl
128 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.