Generating Holistic 3D Human Motion from Speech

Hongwei Yi; Hualin Liang; Yifei Liu; Qiong Cao; Yandong Wen; Timo; Bolkart; Dacheng Tao; Michael J. Black

arXiv:2212.04420·cs.CV·June 21, 2023

Generating Holistic 3D Human Motion from Speech

Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo, Bolkart, Dacheng Tao, Michael J. Black

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel framework for generating realistic 3D holistic human motions from speech, utilizing a new dataset and advanced generative models to produce diverse and coherent body, hand, and facial movements.

Contribution

It presents a new speech-to-motion generation framework with separate modeling of face, body, and hands, and introduces a high-quality dataset for holistic 3D human motion synthesis from speech.

Findings

01

Achieves state-of-the-art qualitative and quantitative results.

02

Generates diverse and coherent 3D human motions from speech.

03

Provides a new dataset and code for future research.

Abstract

This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately. The separated modeling stems from the fact that face articulation strongly correlates with human speech, while body poses and hand gestures are less correlated. Specifically, we employ an autoencoder for face motions, and a compositional vector-quantized variational autoencoder (VQ-VAE) for the body and hand motions. The compositional VQ-VAE is key to generating diverse results. Additionally, we propose a cross-conditional autoregressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Pose and Action Recognition · Human Motion and Animation

MethodsVQ-VAE