Speech Drives Templates: Co-Speech Gesture Synthesis with Learned   Templates

Shenhan Qian; Zhi Tu; Yihao Zhi; Wen Liu; Shenghua Gao

arXiv:2108.08020·cs.CV·November 30, 2021

Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates

Shenhan Qian, Zhi Tu, Yihao Zhi, Wen Liu, Shenghua Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a co-speech gesture synthesis method that learns gesture templates to improve realism and synchronization, effectively combining speech-driven subtle movements with template-based general gestures.

Contribution

The method learns gesture templates to model gesture variability, enhancing realism and synchronization in co-speech gesture generation.

Findings

01

Outperforms existing methods in fidelity and synchronization

02

Uses lip-sync error as a proxy metric for evaluation

03

Generates complete upper-body gestures including arms, hands, and head

Abstract

Co-speech gesture generation is to synthesize a gesture sequence that not only looks real but also matches with the input speech audio. Our method generates the movements of a complete upper body, including arms, hands, and the head. Although recent data-driven methods achieve great success, challenges still exist, such as limited variety, poor fidelity, and lack of objective metrics. Motivated by the fact that the speech cannot fully determine the gesture, we design a method that learns a set of gesture template vectors to model the latent conditions, which relieve the ambiguity. For our method, the template vector determines the general appearance of a generated gesture sequence, while the speech audio drives subtle movements of the body, both indispensable for synthesizing a realistic gesture sequence. Due to the intractability of an objective metric for gesture-speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shenhanqian/speechdrivestemplates
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Human Motion and Animation