GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen, Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

TL;DR
GaussianTalker introduces a novel 3D Gaussian Splatting-based approach for audio-driven talking head synthesis, achieving realistic, synchronized lip movements and high-speed rendering suitable for real-time applications.
Contribution
The paper presents GaussianTalker, a new method that explicitly controls facial motion using 3D Gaussian representations, improving synchronization and visual quality over prior NeRF-based techniques.
Findings
Outperforms state-of-the-art in lip synchronization and visual quality
Achieves 130 FPS rendering speed on NVIDIA RTX4090
Provides stable, realistic talking head videos
Abstract
Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
