Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with   Generative Adversarial Affective Expression Learning

Uttaran Bhattacharya; Elizabeth Childs; Nicholas Rewkowski and; Dinesh Manocha

arXiv:2108.00262·cs.MM·November 26, 2024

Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski and, Dinesh Manocha

PDF

1 Repo

TL;DR

This paper introduces a GAN-based method for synthesizing 3D co-speech gestures that express appropriate affective cues, improving realism and emotional alignment over previous approaches.

Contribution

It proposes a novel affective encoder with multi-scale spatial-temporal graph convolutions integrated into a GAN for more expressive gesture synthesis from speech.

Findings

01

Improved joint error by 10-33% over baselines

02

Enhanced affective expression accuracy in user studies

03

Achieved better gesture realism metrics like Fréchet Gesture Distance

Abstract

We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences. We leverage the Mel-frequency cepstral coefficients and the text transcript computed from the input speech in separate encoders in our generator to learn the desired sentiments and the associated affective cues. We design an affective encoder using multi-scale spatial-temporal graph convolutions to transform 3D pose sequences into latent, pose-based affective features. We use our affective encoder in both our generator, where it learns affective features from the seed poses to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UttaranB127/speech2affective_gestures
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.