The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Jing Lin; Ruisi Wang; Junzhe Lu; Ziqi Huang; Guorui Song; Ailing Zeng; Xian Liu; Chen Wei; Wanqi Yin; Qingping Sun; Zhongang Cai; Lei Yang; Ziwei Liu

arXiv:2510.26794·cs.CV·March 31, 2026

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Jing Lin, Ruisi Wang, Junzhe Lu, Ziqi Huang, Guorui Song, Ailing Zeng, Xian Liu, Chen Wei, Wanqi Yin, Qingping Sun, Zhongang Cai, Lei Yang, Ziwei Liu

PDF

2 Repos 1 Datasets 1 Video

TL;DR

This paper introduces a comprehensive framework for improving generalization in 3D human motion generation by leveraging insights from video generation, including a large dataset, a novel diffusion model, and a new benchmark.

Contribution

It presents ViMoGen, a flow-matching diffusion transformer that integrates data, modeling, and evaluation strategies from video generation to enhance motion generation.

Findings

01

The new dataset ViMoGen-228K significantly expands semantic diversity.

02

The proposed ViMoGen model outperforms existing methods in quality and generalization.

03

The benchmark MBench enables fine-grained evaluation of motion generation models.

Abstract

Despite recent advances in 3D human motion generation (MoGen) on standard benchmarks, existing text-to-motion models still face a fundamental bottleneck in their generalization capability. In contrast, adjacent generative fields, most notably video generation (ViGen), have demonstrated remarkable generalization in modeling human behaviors, highlighting transferable insights that MoGen can leverage. Motivated by this observation, we present a comprehensive framework that systematically transfers knowledge from ViGen to MoGen across three key pillars: data, modeling, and evaluation. First, we introduce ViMoGen-228K, a large-scale dataset comprising 228,000 high-quality motion samples that integrates high-fidelity optical MoCap data with semantically annotated motions from web videos and synthesized samples generated by state-of-the-art ViGen models. The dataset includes both text-motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

wruisi/ViMoGen-228K
dataset· 603 dl
603 dl

Videos

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation· slideslive