SnapMoGen: Human Motion Generation from Expressive Texts

Chuan Guo; Inwoo Hwang; Jian Wang; Bing Zhou

arXiv:2507.09122·cs.CV·October 24, 2025

SnapMoGen: Human Motion Generation from Expressive Texts

Chuan Guo, Inwoo Hwang, Jian Wang, Bing Zhou

PDF

Open Access 3 Datasets

TL;DR

SnapMoGen introduces a large, high-quality dataset and an advanced generative model for human motion synthesis from expressive texts, enabling fine-grained control and long-term motion generation.

Contribution

The paper presents a new dataset with detailed textual annotations and a novel transformer-based model that improves motion generation quality and controllability from expressive text prompts.

Findings

01

Achieved state-of-the-art results on HumanML3D and SnapMoGen benchmarks.

02

Demonstrated effective long-term motion generation and blending.

03

Enabled casual user prompt processing via LLM reformatting.

Abstract

Text-to-motion generation has experienced remarkable progress in recent years. However, current approaches remain limited to synthesizing motion from short or general text prompts, primarily due to dataset constraints. This limitation undermines fine-grained controllability and generalization to unseen prompts. In this paper, we introduce SnapMoGen, a new text-motion dataset featuring high-quality motion capture data paired with accurate, expressive textual annotations. The dataset comprises 20K motion clips totaling 44 hours, accompanied by 122K detailed textual descriptions averaging 48 words per description (vs. 12 words of HumanML3D). Importantly, these motion clips preserve original temporal continuity as they were in long sequences, facilitating research in long-term motion generation and blending. We also improve upon previous generative masked modeling approaches. Our model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Video Analysis and Summarization