Priority-Centric Human Motion Generation in Discrete Latent Space

Hanyang Kong; Kehong Gong; Dongze Lian; Michael Bi Mi; Xinchao Wang

arXiv:2308.14480·cs.CV·August 31, 2023

Priority-Centric Human Motion Generation in Discrete Latent Space

Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang

PDF

Open Access

TL;DR

This paper introduces a priority-centric diffusion model for text-to-human motion generation that emphasizes salient motions based on their importance, resulting in more semantically rich and diverse motions.

Contribution

It proposes a novel discrete diffusion model with a significance-aware noise schedule and a Transformer-based VQ-VAE for improved motion generation.

Findings

01

Outperforms existing methods in fidelity and diversity

02

Effectively captures salient motions in complex descriptions

03

Enhances semantic richness of generated motions

Abstract

Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is essential to recognize that not all motions hold the same relevance to a particular textual description. Some motions, being more salient and informative, should be given precedence during generation. In response, we introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion representation, incorporating a global self-attention mechanism and a regularization term to counteract code collapse. We also present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsALIGN · VQ-VAE · Diffusion