Integrated Speech and Gesture Synthesis

Siyang Wang; Simon Alexanderson; Joakim Gustafson; Jonas Beskow,; Gustav Eje Henter; \'Eva Sz\'ekely

arXiv:2108.11436·cs.HC·August 27, 2021

Integrated Speech and Gesture Synthesis

Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow,, Gustav Eje Henter, \'Eva Sz\'ekely

PDF

1 Repo

TL;DR

This paper introduces a unified neural model for simultaneous speech and gesture synthesis, improving naturalness and efficiency over traditional pipeline approaches by integrating both modalities into a single system.

Contribution

The authors propose a novel integrated speech and gesture synthesis model based on modified neural speech-synthesis engines, demonstrating comparable quality with faster synthesis and fewer parameters.

Findings

01

Participants rated the integrated model as comparable to state-of-the-art pipeline systems.

02

The integrated model achieved faster synthesis times.

03

The model used significantly fewer parameters than traditional pipelines.

Abstract

Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline. This can lead to modeling inefficiencies and may introduce inconsistencies that limit the achievable naturalness. We propose to instead synthesize the two modalities in a single model, a new problem we call integrated speech and gesture synthesis (ISG). We also propose a set of models modified from state-of-the-art neural speech-synthesis engines to achieve this goal. We evaluate the models in three carefully-designed user studies, two of which evaluate the synthesized speech and gesture in isolation, plus a combined study that evaluates the models like they will be used in real-world applications -- speech and gesture presented together. The results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

swatsw/isg_official
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.