M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

Seunggeun Chi; Hyung-gun Chi; Hengbo Ma; Nakul Agarwal; Faizan; Siddiqui; Karthik Ramani; and Kwonjoon Lee

arXiv:2407.14502·cs.CV·July 22, 2024

M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan, Siddiqui, Karthik Ramani, and Kwonjoon Lee

PDF

Open Access

TL;DR

M2D2M is a novel discrete diffusion model that generates coherent multi-action human motions from text, effectively handling transitions and semantics to produce realistic sequences.

Contribution

The paper introduces M2D2M, a new discrete diffusion approach for multi-motion generation from text, with a dynamic transition mechanism and a two-phase sampling strategy.

Findings

01

Outperforms state-of-the-art benchmarks in motion quality.

02

Produces long-term, smooth, and coherent motion sequences.

03

Effectively interprets language semantics for motion generation.

Abstract

We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions. The strength of M2D2M lies in its dynamic transition probability within the discrete diffusion model, which adapts transition probabilities based on the proximity between motion tokens, encouraging mixing between different modes. Complemented by a two-phase sampling strategy that includes independent and joint denoising steps, M2D2M effectively generates long-term, smooth, and contextually coherent human motion sequences, utilizing a model trained for single-motion generation. Extensive experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Human Motion and Animation · Speech Recognition and Synthesis

MethodsDiffusion