Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

Yijie Qian; Juncheng Wang; Yuxiang Feng; Chao Xu; Wang Lu; Yang Liu; Baigui Sun; Yiqiang Chen; Yong Liu; Shujun Wang

arXiv:2512.24100·cs.CV·January 1, 2026

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

Yijie Qian, Juncheng Wang, Yuxiang Feng, Chao Xu, Wang Lu, Yang Liu, Baigui Sun, Yiqiang Chen, Yong Liu, Shujun Wang

PDF

Open Access

TL;DR

This paper introduces Latent Motion Reasoning, a two-stage approach inspired by cognitive science, to improve text-to-motion generation by disentangling semantic planning from physical execution, leading to better alignment and plausibility.

Contribution

The paper proposes a novel Latent Motion Reasoning framework with a Dual-Granularity Tokenizer, enabling effective planning and execution in text-to-motion tasks, addressing the semantic-kinematic mismatch.

Findings

01

Improved semantic alignment in generated motions.

02

Enhanced physical plausibility of motions.

03

Versatility demonstrated on multiple baseline models.

Abstract

Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effective for simple actions, this System 1 approach faces a fundamental theoretical bottleneck we identify as the Semantic-Kinematic Impedance Mismatch: the inherent difficulty of grounding semantically dense, discrete linguistic intent into kinematically dense, high-frequency motion data in a single shot. In this paper, we argue that the solution lies in an architectural shift towards Latent System 2 Reasoning. Drawing inspiration from Hierarchical Motor Control in cognitive science, we propose Latent Motion Reasoning (LMR) that reformulates generation as a two-stage Think-then-Act decision process. Central to LMR is a novel Dual-Granularity Tokenizer that disentangles motion into two distinct manifolds: a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Robot Manipulation and Learning