UniMo: Unified Motion Generation and Understanding with Chain of Thought

Guocun Wang; Kenkun Liu; Jing Lin; Guorui Song; Jian Li; Xiaoguang Han

arXiv:2601.12126·cs.AI·January 21, 2026

UniMo: Unified Motion Generation and Understanding with Chain of Thought

Guocun Wang, Kenkun Liu, Jing Lin, Guorui Song, Jian Li, Xiaoguang Han

PDF

Open Access 1 Video

TL;DR

UniMo is a new framework that combines motion and language understanding with interpretable reasoning and reinforcement learning, leading to superior performance in 3D human motion tasks.

Contribution

It introduces a unified approach integrating chain of thought reasoning and reinforcement learning to improve motion generation and understanding.

Findings

01

Outperforms existing models in motion tasks

02

Achieves state-of-the-art results in motion generation

03

Enhances interpretability and semantic alignment

Abstract

Existing 3D human motion generation and understanding methods often exhibit limited interpretability, restricting effective mutual enhancement between these inherently related tasks. While current unified frameworks based on large language models (LLMs) leverage linguistic priors, they frequently encounter challenges in semantic alignment and task coherence. Moreover, the next-token prediction paradigm in LLMs is ill-suited for motion sequences, causing cumulative prediction errors. To address these limitations, we propose UniMo, a novel framework that integrates motion-language information and interpretable chain of thought (CoT) reasoning into the LLM via supervised fine-tuning (SFT). We further introduce reinforcement learning with Group Relative Policy Optimization (GRPO) as a post-training strategy that optimizes over groups of tokens to enforce structural correctness and semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

UniMo: Unified Motion Generation and Understanding with Chain of Thought· underline

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications