Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics
Tianshuo Xu, Zhifei Chen, Leyi Wu, Hao Lu, Ying-cong Chen

TL;DR
This paper introduces Motion Forcing, a hierarchical framework that decouples physical reasoning from visual synthesis to improve robustness and physical consistency in complex video generation tasks.
Contribution
It proposes a novel Point-Shape-Appearance paradigm and masked point recovery strategy to enhance physical understanding and stability in video generation.
Findings
Outperforms state-of-the-art methods on autonomous driving benchmarks
Maintains physical consistency in complex scenes with collisions or dense traffic
Demonstrates generality across physics and robotics applications
Abstract
The ultimate goal of video generation is to satisfy a fundamental trilemma: achieving high visual quality, maintaining rigorous physical consistency, and enabling precise controllability. While recent models can maintain this balance in simple, isolated scenarios, we observe that this equilibrium is fragile and often breaks down as scene complexity increases (e.g., involving collisions or dense traffic). To address this, we introduce \textbf{Motion Forcing}, a framework designed to stabilize this trilemma even in complex generative tasks. Our key insight is to explicitly decouple physical reasoning from visual synthesis via a hierarchical \textbf{``Point-Shape-Appearance''} paradigm. This approach decomposes generation into verifiable stages: modeling complex dynamics as sparse geometric anchors (\textbf{Point}), expanding them into dynamic depth maps that explicitly resolve 3D geometry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Human Motion and Animation
