M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Zhizhuo Yin; Yuk Hang Tsui; Pan Hui

arXiv:2505.08293·cs.GR·May 20, 2025

M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Zhizhuo Yin, Yuk Hang Tsui, Pan Hui

PDF

TL;DR

This paper introduces M3G, a multi-granular framework for generating natural full-body human gestures from audio by modeling gesture patterns at different temporal granularities.

Contribution

The paper proposes a novel multi-granular tokenization and prediction framework for audio-driven gesture synthesis, addressing fixed granularity limitations of prior methods.

Findings

01

Outperforms state-of-the-art in naturalness and expressiveness

02

Effective multi-granular tokenization of motion patterns

03

Improved gesture diversity and realism

Abstract

Generating full-body human gestures encompassing face, body, hands, and global movements from audio is a valuable yet challenging task in virtual avatar creation. Previous systems focused on tokenizing the human gestures framewisely and predicting the tokens of each frame from the input audio. However, one observation is that the number of frames required for a complete expressive human gesture, defined as granularity, varies among different human gesture patterns. Existing systems fail to model these gesture patterns due to the fixed granularity of their gesture tokens. To solve this problem, we propose a novel framework named Multi-Granular Gesture Generator (M3G) for audio-driven holistic gesture generation. In M3G, we propose a novel Multi-Granular VQ-VAE (MGVQ-VAE) to tokenize motion patterns and reconstruct motion sequences from different temporal granularities. Subsequently, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsVQ-VAE