D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis
Yuhang Guo, Kaijun Deng, Siyang Song, Jindong Xie, Wenhui Ma, Linlin Shen

TL;DR
D^3-Talker introduces a dual-branch framework for few-shot 3D talking head synthesis, effectively decoupling general and personalized deformations to improve lip synchronization and image quality with limited data.
Contribution
It proposes a novel dual-branch deformation model with a similarity contrastive loss for better decoupling and a Coarse-to-Fine module for enhanced image quality in few-shot 3D talking head synthesis.
Findings
Outperforms state-of-the-art in rendering quality
Achieves accurate audio-lip synchronization
Works effectively with limited training data
Abstract
A key challenge in 3D talking head synthesis lies in the reliance on a long-duration talking head video to train a new model for each target identity from scratch. Recent methods have attempted to address this issue by extracting general features from audio through pre-training models. However, since audio contains information irrelevant to lip motion, existing approaches typically struggle to map the given audio to realistic lip behaviors in the target face when trained on only a few frames, causing poor lip synchronization and talking head image quality. This paper proposes D^3-Talker, a novel approach that constructs a static 3D Gaussian attribute field and employs audio and Facial Motion signals to independently control two distinct Gaussian attribute deformation fields, effectively decoupling the predictions of general and personalized deformations. We design a novel similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdditive Manufacturing and 3D Printing Technologies · Modular Robots and Swarm Intelligence · Interactive and Immersive Displays
