Learning Generalizable Human Motion Generator with Reinforcement Learning
Yunyao Mao, Xiaoyang Liu, Wengang Zhou, Zhenbo Lu, Houqiang Li

TL;DR
This paper introduces InstructMotion, a reinforcement learning-based approach for text-driven human motion generation that improves generalization to unseen prompts by leveraging contrastive encoders and synthetic data.
Contribution
The paper proposes a novel reinforcement learning framework with contrastive pre-trained encoders to enhance generalization in text-to-motion generation, addressing overfitting issues.
Findings
Outperforms existing methods on standard benchmarks
Effective in generating diverse and novel motions
Utilizes synthetic text-only data for better generalization
Abstract
Text-driven human motion generation, as one of the vital tasks in computer-aided content creation, has recently attracted increasing attention. While pioneering research has largely focused on improving numerical performance metrics on given datasets, practical applications reveal a common challenge: existing methods often overfit specific motion expressions in the training data, hindering their ability to generalize to novel descriptions like unseen combinations of motions. This limitation restricts their broader applicability. We argue that the aforementioned problem primarily arises from the scarcity of available motion-text pairs, given the many-to-many nature of text-driven motion generation. To tackle this problem, we formulate text-to-motion generation as a Markov decision process and present \textbf{InstructMotion}, which incorporate the trail and error paradigm in reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
