Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng; Yiming Xie; Xiaogang Peng; Zeyu Han; Huaizu Jiang

arXiv:2411.16575·cs.CV·July 10, 2025

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng, Zeyu Han, Huaizu Jiang

PDF

Open Access

TL;DR

This paper enhances diffusion-based human motion generation by reforming data representation, enabling masked autoregression, and proposing a robust evaluation, leading to state-of-the-art results over VQ-based methods.

Contribution

It introduces a diffusion model with masked autoregression and improved data representation, addressing limitations of VQ-based methods and advancing motion generation quality.

Findings

01

Outperforms previous methods on multiple datasets.

02

Achieves state-of-the-art performance in human motion generation.

03

Demonstrates robustness of the proposed evaluation method.

Abstract

Since 2023, Vector Quantization (VQ)-based discrete generation methods have rapidly dominated human motion generation, primarily surpassing diffusion-based continuous generation methods in standard performance metrics. However, VQ-based methods have inherent limitations. Representing continuous motion data as limited discrete tokens leads to inevitable information loss, reduces the diversity of generated motions, and restricts their ability to function effectively as motion priors or generation guidance. In contrast, the continuous space generation nature of diffusion-based methods makes them well-suited to address these limitations and with even potential for model scalability. In this work, we systematically investigate why current VQ-based methods perform well and explore the limitations of existing diffusion-based methods from the perspective of motion data representation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Hand Gesture Recognition Systems

MethodsDiffusion