PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
Yifan Xie, Tao Feng, Xin Zhang, Xiangyang Luo, Zixuan Guo, Weijiang, Yu, Heng Chang, Fei Ma, Fei Richard Yu

TL;DR
PointTalk introduces a novel 3D Gaussian-based approach for talking head synthesis that effectively captures lip movements and enhances audio-lip synchronization using dynamic lip point clouds and cross-modal feature integration.
Contribution
The paper proposes a new 3D Gaussian-based method with dynamic lip point clouds and an audio-point enhancement module for improved talking head synthesis.
Findings
Achieves superior high-fidelity visual quality.
Demonstrates improved audio-lip synchronization.
Outperforms previous methods in experiments.
Abstract
Talking head synthesis with arbitrary speech audio is a crucial challenge in the field of digital humans. Recently, methods based on radiance fields have received increasing attention due to their ability to synthesize high-fidelity and identity-consistent talking heads from just a few minutes of training video. However, due to the limited scale of the training data, these methods often exhibit poor performance in audio-lip synchronization and visual quality. In this paper, we propose a novel 3D Gaussian-based method called PointTalk, which constructs a static 3D Gaussian field of the head and deforms it in sync with the audio. It also incorporates an audio-driven dynamic lip point cloud as a critical component of the conditional information, thereby facilitating the effective synthesis of talking heads. Specifically, the initial step involves generating the corresponding lip point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Human Motion and Animation
MethodsSoftmax · Attention Is All You Need
