C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion   Model

Longbin Ji; Pengfei Wei; Yi Ren; Jinglin Liu; Chen Zhang; Xiang Yin

arXiv:2308.15016·cs.CV·August 30, 2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin

PDF

Open Access 1 Repo

TL;DR

C2G2 introduces a controllable, high-fidelity co-speech gesture generation framework using latent diffusion models, enabling stable, temporally consistent, and editable gestures with speaker-specific control for digital avatars.

Contribution

The paper presents a novel two-stage temporal dependency enhancement strategy and a repainting control mechanism within a latent diffusion framework for improved gesture generation.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Achieves stable and temporally consistent gesture synthesis.

03

Enables flexible editing and speaker-specific gesture control.

Abstract

Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal latent information and applying practical controlling, we propose a Controllable Co-speech Gesture Generation framework, named C2G2. Specifically, we propose a two-stage temporal dependency enhancement strategy motivated by latent diffusion models. We further introduce two key features to C2G2, namely a speaker-specific decoder to generate speaker-related real-length skeletons and a repainting strategy for flexible gesture generation/editing. Extensive experiments on benchmark gesture datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

C2G2-Gesture/C2G2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsDiffusion