Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation
Yijie Qian, Juncheng Wang, Yuxiang Feng, Chao Xu, Wang Lu, Yang Liu, Baigui Sun, Yiqiang Chen, Yong Liu, Shujun Wang

TL;DR
This paper introduces Latent Motion Reasoning, a two-stage approach inspired by cognitive science, to improve text-to-motion generation by disentangling semantic planning from physical execution, leading to better alignment and plausibility.
Contribution
The paper proposes a novel Latent Motion Reasoning framework with a Dual-Granularity Tokenizer, enabling effective planning and execution in text-to-motion tasks, addressing the semantic-kinematic mismatch.
Findings
Improved semantic alignment in generated motions.
Enhanced physical plausibility of motions.
Versatility demonstrated on multiple baseline models.
Abstract
Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effective for simple actions, this System 1 approach faces a fundamental theoretical bottleneck we identify as the Semantic-Kinematic Impedance Mismatch: the inherent difficulty of grounding semantically dense, discrete linguistic intent into kinematically dense, high-frequency motion data in a single shot. In this paper, we argue that the solution lies in an architectural shift towards Latent System 2 Reasoning. Drawing inspiration from Hierarchical Motor Control in cognitive science, we propose Latent Motion Reasoning (LMR) that reformulates generation as a two-stage Think-then-Act decision process. Central to LMR is a novel Dual-Granularity Tokenizer that disentangles motion into two distinct manifolds: a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Robot Manipulation and Learning
