GaussianSpeech: Audio-Driven Gaussian Avatars

Shivangi Aneja; Artem Sevastopolsky; Tobias Kirschstein; Justus Thies,; Angela Dai; Matthias Nie{\ss}ner

arXiv:2411.18675·cs.CV·December 2, 2024

GaussianSpeech: Audio-Driven Gaussian Avatars

Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies,, Angela Dai, Matthias Nie{\ss}ner

PDF

Open Access 1 Repo

TL;DR

GaussianSpeech is a new method that synthesizes realistic 3D talking head avatars from audio, capturing detailed facial expressions and movements in real time using a novel Gaussian splatting representation and audio-conditioned transformer.

Contribution

It introduces a compact 3D Gaussian splatting-based avatar representation and an audio-conditioned transformer for realistic, expressive, and real-time 3D head animation from speech.

Findings

01

Achieves state-of-the-art visual realism and motion coherence.

02

Capable of real-time rendering of diverse facial expressions.

03

Developed a new large-scale audio-visual dataset of talking humans.

Abstract

We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivangi-aneja/GaussianSpeech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies