Priority-Centric Human Motion Generation in Discrete Latent Space
Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang

TL;DR
This paper introduces a priority-centric diffusion model for text-to-human motion generation that emphasizes salient motions based on their importance, resulting in more semantically rich and diverse motions.
Contribution
It proposes a novel discrete diffusion model with a significance-aware noise schedule and a Transformer-based VQ-VAE for improved motion generation.
Findings
Outperforms existing methods in fidelity and diversity
Effectively captures salient motions in complex descriptions
Enhances semantic richness of generated motions
Abstract
Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is essential to recognize that not all motions hold the same relevance to a particular textual description. Some motions, being more salient and informative, should be given precedence during generation. In response, we introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion representation, incorporating a global self-attention mechanism and a regularization term to counteract code collapse. We also present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications
MethodsALIGN · VQ-VAE · Diffusion
