GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting

Madhav Agarwal; Mingtian Zhang; Laura Sevilla-Lara; Steven McDonagh

arXiv:2512.10939·cs.CV·December 12, 2025

GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting

Madhav Agarwal, Mingtian Zhang, Laura Sevilla-Lara, Steven McDonagh

PDF

Open Access

TL;DR

GaussianHeadTalk introduces a real-time, audio-driven 3D talking head system that combines Gaussian Splatting with 3D Morphable Models and transformer-based parameter prediction for stable, high-fidelity avatar videos.

Contribution

It presents a novel approach integrating Gaussian Splatting with 3D Morphable Models and transformers to improve stability and realism in speech-driven talking head videos.

Findings

01

Achieves real-time performance with high visual fidelity.

02

Demonstrates improved temporal stability over previous methods.

03

Reports competitive quantitative and qualitative results.

Abstract

Speech-driven talking heads have recently emerged and enable interactive avatars. However, real-world applications are limited, as current methods achieve high visual fidelity but slow or fast yet temporally unstable. Diffusion methods provide realistic image generation, yet struggle with oneshot settings. Gaussian Splatting approaches are real-time, yet inaccuracies in facial tracking, or inconsistent Gaussian mappings, lead to unstable outputs and video artifacts that are detrimental to realistic use cases. We address this problem by mapping Gaussian Splatting using 3D Morphable Models to generate person-specific avatars. We introduce transformer-based prediction of model parameters, directly from audio, to drive temporal consistency. From monocular video and independent audio speech inputs, our method enables generation of real-time talking head videos where we report competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing