GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian   Splatting

Hongyun Yu; Zhan Qu; Qihang Yu; Jianchuan Chen; Zhonghua Jiang; Zhiwen; Chen; Shengyu Zhang; Jimin Xu; Fei Wu; Chengfei Lv; Gang Yu

arXiv:2404.14037·cs.CV·August 12, 2024

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen, Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

PDF

TL;DR

GaussianTalker introduces a novel 3D Gaussian Splatting-based approach for audio-driven talking head synthesis, achieving realistic, synchronized lip movements and high-speed rendering suitable for real-time applications.

Contribution

The paper presents GaussianTalker, a new method that explicitly controls facial motion using 3D Gaussian representations, improving synchronization and visual quality over prior NeRF-based techniques.

Findings

01

Outperforms state-of-the-art in lip synchronization and visual quality

02

Achieves 130 FPS rendering speed on NVIDIA RTX4090

03

Provides stable, realistic talking head videos

Abstract

Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.