TL;DR
This paper introduces a GAN-based method to generate realistic gestures from speech for virtual avatars, evaluated through a Turing test-inspired user study, demonstrating high realism and speaker-specific correlation.
Contribution
A novel data-driven approach using GANs to synthesize speech-related gestures without specialized hardware, capturing speaker-specific non-verbal cues.
Findings
Users cannot distinguish generated from recorded gestures
Generated gestures are perceived as related to speech
Model captures speaker-specific gesture correlations
Abstract
People communicate using both speech and non-verbal signals such as gestures, face expression or body pose. Non-verbal signals impact the meaning of the spoken utterance in an abundance of ways. An absence of non-verbal signals impoverishes the process of communication. Yet, when users are represented as avatars, it is difficult to translate non-verbal signals along with the speech into the virtual world without specialized motion-capture hardware. In this paper, we propose a novel, data-driven technique for generating gestures directly from speech. Our approach is based on the application of Generative Adversarial Neural Networks (GANs) to model the correlation rather than causation between speech and gestures. This approach approximates neuroscience findings on how non-verbal communication and speech are correlated. We create a large dataset which consists of speech and corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
