PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based   Talking Head Synthesis

Yifan Xie; Tao Feng; Xin Zhang; Xiangyang Luo; Zixuan Guo; Weijiang; Yu; Heng Chang; Fei Ma; Fei Richard Yu

arXiv:2412.08504·cs.SD·December 12, 2024

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis

Yifan Xie, Tao Feng, Xin Zhang, Xiangyang Luo, Zixuan Guo, Weijiang, Yu, Heng Chang, Fei Ma, Fei Richard Yu

PDF

Open Access 1 Video

TL;DR

PointTalk introduces a novel 3D Gaussian-based approach for talking head synthesis that effectively captures lip movements and enhances audio-lip synchronization using dynamic lip point clouds and cross-modal feature integration.

Contribution

The paper proposes a new 3D Gaussian-based method with dynamic lip point clouds and an audio-point enhancement module for improved talking head synthesis.

Findings

01

Achieves superior high-fidelity visual quality.

02

Demonstrates improved audio-lip synchronization.

03

Outperforms previous methods in experiments.

Abstract

Talking head synthesis with arbitrary speech audio is a crucial challenge in the field of digital humans. Recently, methods based on radiance fields have received increasing attention due to their ability to synthesize high-fidelity and identity-consistent talking heads from just a few minutes of training video. However, due to the limited scale of the training data, these methods often exhibit poor performance in audio-lip synchronization and visual quality. In this paper, we propose a novel 3D Gaussian-based method called PointTalk, which constructs a static 3D Gaussian field of the head and deforms it in sync with the audio. It also incorporates an audio-driven dynamic lip point cloud as a critical component of the conditional information, thereby facilitating the effective synthesis of talking heads. Specifically, the initial step involves generating the corresponding lip point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis· underline

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Human Motion and Animation

MethodsSoftmax · Attention Is All You Need