A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis

Zixiang Zhou; Yu Wan; Baoyuan Wang

arXiv:2311.16471·cs.CV·November 29, 2023·2 cites

A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis

Zixiang Zhou, Yu Wan, Baoyuan Wang

PDF

Open Access

TL;DR

This paper presents a scalable, unified framework for synthesizing multimodal and multi-part human motion by quantizing motions, using pre-trained models for encoding signals, and predicting motion tokens.

Contribution

It introduces a novel token prediction-based approach that unifies multimodal and multi-part human motion synthesis, enhancing scalability and integration of new modalities.

Findings

01

Effective in generating realistic multi-part motions

02

Scalable framework easily incorporates new modalities

03

Demonstrates broad applicability through extensive experiments

Abstract

The field has made significant progress in synthesizing realistic human motion driven by various modalities. Yet, the need for different methods to animate various body parts according to different control signals limits the scalability of these techniques in practical scenarios. In this paper, we introduce a cohesive and scalable approach that consolidates multimodal (text, music, speech) and multi-part (hand, torso) human motion generation. Our methodology unfolds in several steps: We begin by quantizing the motions of diverse body parts into separate codebooks tailored to their respective domains. Next, we harness the robust capabilities of pre-trained models to transcode multimodal signals into a shared latent space. We then translate these signals into discrete motion tokens by iteratively predicting subsequent tokens to form a complete sequence. Finally, we reconstruct the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Hand Gesture Recognition Systems