Joint Co-Speech Gesture and Expressive Talking Face Generation using   Diffusion with Adapters

Steven Hogue; Chenxu Zhang; Yapeng Tian; Xiaohu Guo

arXiv:2412.14333·cs.CV·December 20, 2024

Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters

Steven Hogue, Chenxu Zhang, Yapeng Tian, Xiaohu Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified diffusion-based model with adapters that jointly generates co-speech gestures and talking head movements, reducing complexity and parameter count while maintaining high-quality output.

Contribution

It presents a novel single-network architecture that models face and body movements together using shared weights and adapters, improving efficiency and coherence.

Findings

01

Achieves state-of-the-art performance in co-speech gesture and talking head generation.

02

Reduces model parameters significantly compared to separate models.

03

Maintains high-quality, synchronized face and body motion generation.

Abstract

Recent advances in co-speech gesture and talking head generation have been impressive, yet most methods focus on only one of the two tasks. Those that attempt to generate both often rely on separate models or network modules, increasing training complexity and ignoring the inherent relationship between face and body movements. To address the challenges, in this paper, we propose a novel model architecture that jointly generates face and body motions within a single network. This approach leverages shared weights between modalities, facilitated by adapters that enable adaptation to a common latent space. Our experiments demonstrate that the proposed framework not only maintains state-of-the-art co-speech gesture and talking head generation performance but also significantly reduces the number of parameters required.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ditzley/joint-gestures-and-face
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsFocus