DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong, Bao, Ming Cheng, Long Xiao

TL;DR
DiffuseStyleGesture employs diffusion models with attention mechanisms to generate high-quality, stylized, and diverse co-speech gestures that match speech rhythm and semantics, advancing automatic gesture synthesis.
Contribution
It introduces a diffusion-based approach with attention mechanisms and style control for speech-driven gesture generation, improving realism and diversity.
Findings
Outperforms recent methods in gesture quality and diversity
Generates speech-matched and stylized gestures effectively
Enables style control through interpolation and extrapolation
Abstract
The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Hand Gesture Recognition Systems · Human Pose and Action Recognition
