Moving fast and slow: Analysis of representations and post-processing in   speech-driven automatic gesture generation

Taras Kucherenko; Dai Hasegawa; Naoshi Kaneko; Gustav Eje Henter,; Hedvig Kjellstr\"om

arXiv:2007.09170·cs.CV·April 7, 2021

Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter,, Hedvig Kjellstr\"om

PDF

1 Repo

TL;DR

This paper introduces a new speech-driven gesture generation framework that leverages representation learning, improving motion dynamics and naturalness, and highlights the significance of post-processing in gesture synthesis.

Contribution

The paper extends deep-learning methods for gesture generation by analyzing input/output representations and the impact of post-processing, demonstrating improved naturalness and motion quality.

Findings

01

Improved motion dynamics and speed matching in generated gestures.

02

User studies show increased perceived naturalness of gestures.

03

Post-processing techniques like smoothing enhance gesture quality.

Abstract

This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. We provide an analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations. We also analyse the importance of smoothing of the produced motion. Our results indicated that the proposed method improved on our baseline in terms of objective measures. For example, it better captured the motion dynamics and better matched the motion-speed distribution. Moreover, we performed user studies on two different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GestureGeneration/Speech_driven_gesture_generation_with_autoencoder
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.