SnapMoGen: Human Motion Generation from Expressive Texts
Chuan Guo, Inwoo Hwang, Jian Wang, Bing Zhou

TL;DR
SnapMoGen introduces a large, high-quality dataset and an advanced generative model for human motion synthesis from expressive texts, enabling fine-grained control and long-term motion generation.
Contribution
The paper presents a new dataset with detailed textual annotations and a novel transformer-based model that improves motion generation quality and controllability from expressive text prompts.
Findings
Achieved state-of-the-art results on HumanML3D and SnapMoGen benchmarks.
Demonstrated effective long-term motion generation and blending.
Enabled casual user prompt processing via LLM reformatting.
Abstract
Text-to-motion generation has experienced remarkable progress in recent years. However, current approaches remain limited to synthesizing motion from short or general text prompts, primarily due to dataset constraints. This limitation undermines fine-grained controllability and generalization to unseen prompts. In this paper, we introduce SnapMoGen, a new text-motion dataset featuring high-quality motion capture data paired with accurate, expressive textual annotations. The dataset comprises 20K motion clips totaling 44 hours, accompanied by 122K detailed textual descriptions averaging 48 words per description (vs. 12 words of HumanML3D). Importantly, these motion clips preserve original temporal continuity as they were in long sequences, facilitating research in long-term motion generation and blending. We also improve upon previous generative masked modeling approaches. Our model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Video Analysis and Summarization
