Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan; Shunlin Lu; Minyue Dai; Runyi Yu; Lixing Xiao; Zhiyang Dou; Junting Dong; Lizhuang Ma; Jingbo Wang

arXiv:2507.07095·cs.CV·July 10, 2025

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, Jingbo Wang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MotionMillion, the largest human motion dataset, and a scalable model that achieves zero-shot generalization in text-to-motion generation, significantly advancing the field.

Contribution

It presents MotionMillion, a large-scale dataset, and a comprehensive benchmark, along with a scalable model that demonstrates strong zero-shot generalization capabilities.

Findings

01

Achieved state-of-the-art zero-shot motion generation performance.

02

Demonstrated strong out-of-domain and complex motion generalization.

03

Provided a new large-scale dataset and evaluation framework.

Abstract

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion-the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

InternRobotics/MotionMillion
dataset· 445 dl
445 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition