TL;DR
This paper presents DiffuseStyleGesture+, a diffusion model-based system for generating realistic conversational gestures for embodied agents, evaluated in the GENEA Challenge 2023, demonstrating competitive performance with top models.
Contribution
The paper introduces DiffuseStyleGesture+, a novel diffusion model approach that integrates multiple modalities for gesture generation, achieving state-of-the-art results in the GENEA Challenge 2023.
Findings
Performs on par with top models in human-likeness and appropriateness
Uses multimodal inputs including audio, text, speaker ID, and seed gestures
Achieves competitive results in gesture generation for conversational agents
Abstract
In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures. Participants are provided with a pre-processed dataset and their systems are evaluated through crowdsourced scoring. Our proposed model, DiffuseStyleGesture+, leverages a diffusion model to generate gestures automatically. It incorporates a variety of modalities, including audio, text, speaker ID, and seed gestures. These diverse modalities are mapped to a hidden space and processed by a modified diffusion model to produce the corresponding gesture for a given speech input. Upon evaluation, the DiffuseStyleGesture+ demonstrated performance on par with the top-tier models in the challenge, showing no significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
