IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation
Yuan-Ming Li, Qize Yang, Nan Lei, Shenghao Fu, Ling-An Zeng, Jian-Fang Hu, Xihan Wei, Wei-Shi Zheng

TL;DR
IRG-MotionLLM introduces an innovative interleaved reasoning paradigm that tightly couples motion generation, assessment, and refinement through iterative dialogue, significantly enhancing text-to-motion generation performance.
Contribution
This work presents the first model to seamlessly interleave motion generation, assessment, and refinement, utilizing a novel three-stage training scheme and automated data synthesis.
Findings
Assessment and refinement improve text-motion alignment.
Interleaving steps yield performance gains across training stages.
IRG-MotionLLM outperforms baseline models on benchmarks.
Abstract
Recent advances in motion-aware large language models have shown remarkable promise for unifying motion understanding and generation tasks. However, these models typically treat understanding and generation separately, limiting the mutual benefits that could arise from interactive feedback between tasks. In this work, we reveal that motion assessment and refinement tasks act as crucial bridges to enable bidirectional knowledge flow between understanding and generation. Leveraging this insight, we propose Interleaved Reasoning for Motion Generation (IRMoGen), a novel paradigm that tightly couples motion generation with assessment and refinement through iterative text-motion dialogue. To realize this, we introduce IRG-MotionLLM, the first model that seamlessly interleaves motion generation, assessment, and refinement to improve generation performance. IRG-MotionLLM is developed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
