Generating Holistic 3D Human Motion from Speech
Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo, Bolkart, Dacheng Tao, Michael J. Black

TL;DR
This paper introduces a novel framework for generating realistic 3D holistic human motions from speech, utilizing a new dataset and advanced generative models to produce diverse and coherent body, hand, and facial movements.
Contribution
It presents a new speech-to-motion generation framework with separate modeling of face, body, and hands, and introduces a high-quality dataset for holistic 3D human motion synthesis from speech.
Findings
Achieves state-of-the-art qualitative and quantitative results.
Generates diverse and coherent 3D human motions from speech.
Provides a new dataset and code for future research.
Abstract
This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately. The separated modeling stems from the fact that face articulation strongly correlates with human speech, while body poses and hand gestures are less correlated. Specifically, we employ an autoencoder for face motions, and a compositional vector-quantized variational autoencoder (VQ-VAE) for the body and hand motions. The compositional VQ-VAE is key to generating diverse results. Additionally, we propose a cross-conditional autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Human Pose and Action Recognition · Human Motion and Animation
MethodsVQ-VAE
