SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis
Peng Jia, Zhen Xiao, Jia Li, Xueliang Liu, Zhenzhen Hu, and Lingyun Yu

TL;DR
SDTalk introduces a novel one-shot 3D Gaussian Splatting framework that generalizes to unseen identities for real-time talking head synthesis, combining structured facial priors and dual-branch motion modeling.
Contribution
It presents a new framework that enables cross-identity generalization without personalized training, integrating structured facial priors and dual-branch motion fields for improved synthesis.
Findings
Outperforms existing methods in visual quality.
Achieves real-time inference efficiency.
Successfully generalizes to unseen identities.
Abstract
High-quality, real-time talking head synthesis remains a fundamental challenge in computer vision. Existing reconstruction- and rendering-based methods typically rely on identity-specific models, limiting cross-identity generalization. To address this issue, we propose SDTalk, a one-shot 3D Gaussian Splatting (3DGS)-based framework that generalizes to unseen identities without personalized training or fine-tuning. Our framework comprises two modules with a two-stage training strategy. In the first stage, we incorporate structured facial priors into the reconstruction module and separately predict 3DGS parameters for visible and occluded regions, enabling complete head reconstruction from a single image. In the second stage, we introduce a dual-branch motion field to model coarse and fine facial dynamics, improving detail fidelity and lip synchronization. Experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
