Scaling Large Motion Models with Million-Level Human Motions
Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Weishuai Zeng, Qin Jin, Zongqing Lu

TL;DR
This paper introduces MotionLib, a large-scale human motion dataset, and a new motion model, extprojname, demonstrating the importance of scaling data and models for improved human motion generation, with innovative encoding techniques.
Contribution
It presents the first million-level human motion dataset with hierarchical descriptions and a novel motion encoding method, advancing large-scale motion generation models.
Findings
Scaling data and model size improves motion generation performance.
Motionbook encoding enhances motion representation and detail preservation.
MotionLib enables training of more versatile and robust motion models.
Abstract
Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted toward developing large motion models. Despite some progress, current efforts remain far from achieving truly generalist models, primarily due to the lack of massive high-quality data. To address this gap, we present MotionLib, the first million-level dataset for motion generation, which is at least 15 larger than existing counterparts and enriched with hierarchical text descriptions. Using MotionLib, we train a large motion model named \projname, demonstrating robust performance across a wide range of human activities, including unseen ones. Through systematic investigation, for the first time, we highlight the importance of scaling both data and model size for advancing motion generation, along with key insights to achieve this goal. To better integrate the motion modality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
