Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion   Models

Simon Alexanderson; Rajmund Nagy; Jonas Beskow; Gustav Eje Henter

arXiv:2211.09707·cs.LG·May 17, 2023·6 cites

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

Simon Alexanderson, Rajmund Nagy, Jonas Beskow, Gustav Eje Henter

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a diffusion model-based approach for synthesizing human motion from audio, enabling high-quality, style-controllable gesture, dance, and locomotion generation with flexible style interpolation.

Contribution

It adapts the DiffWave architecture with Conformers for 3D pose synthesis and extends guidance techniques for style control and ensemble modeling, advancing audio-driven motion synthesis.

Findings

01

Achieves top-tier motion quality in gesture and dance generation

02

Enables adjustable stylistic expression in synthesized motion

03

Demonstrates effective style interpolation using ensemble models

Abstract

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing and co-speech gesticulation, since motion is complex and highly ambiguous given audio, calling for a probabilistic description. Specifically, we adapt the DiffWave architecture to model 3D pose sequences, putting Conformers in place of dilated convolutions for improved modelling power. We also demonstrate control over motion style, using classifier-free guidance to adjust the strength of the stylistic expression. Experiments on gesture and dance generation confirm that the proposed method achieves top-of-the-line motion quality, with distinctive styles whose expression can be made more or less pronounced. We also synthesise path-driven locomotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

youngseng/diffusestylegesture
pytorch

Models

🤗
youngseng/DiffuseStyleGesture
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Advanced Vision and Imaging

MethodsDiffusion