IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation

Yuan-Ming Li; Qize Yang; Nan Lei; Shenghao Fu; Ling-An Zeng; Jian-Fang Hu; Xihan Wei; Wei-Shi Zheng

arXiv:2512.10730·cs.CV·December 12, 2025

IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation

Yuan-Ming Li, Qize Yang, Nan Lei, Shenghao Fu, Ling-An Zeng, Jian-Fang Hu, Xihan Wei, Wei-Shi Zheng

PDF

Open Access

TL;DR

IRG-MotionLLM introduces an innovative interleaved reasoning paradigm that tightly couples motion generation, assessment, and refinement through iterative dialogue, significantly enhancing text-to-motion generation performance.

Contribution

This work presents the first model to seamlessly interleave motion generation, assessment, and refinement, utilizing a novel three-stage training scheme and automated data synthesis.

Findings

01

Assessment and refinement improve text-motion alignment.

02

Interleaving steps yield performance gains across training stages.

03

IRG-MotionLLM outperforms baseline models on benchmarks.

Abstract

Recent advances in motion-aware large language models have shown remarkable promise for unifying motion understanding and generation tasks. However, these models typically treat understanding and generation separately, limiting the mutual benefits that could arise from interactive feedback between tasks. In this work, we reveal that motion assessment and refinement tasks act as crucial bridges to enable bidirectional knowledge flow between understanding and generation. Leveraging this insight, we propose Interleaved Reasoning for Motion Generation (IRMoGen), a novel paradigm that tightly couples motion generation with assessment and refinement through iterative text-motion dialogue. To realize this, we introduce IRG-MotionLLM, the first model that seamlessly interleaves motion generation, assessment, and refinement to improve generation performance. IRG-MotionLLM is developed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis