Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting
Tong Shi, Melonie de Almeida, Daniela Ivanova, Nicolas Pugeault, Paul Henderson

TL;DR
Splat-Portrait introduces a novel Gaussian-splatting-based method for talking head generation that automatically disentangles static 3D structure and lip motion from a single image and audio, achieving superior visual quality without 3D supervision.
Contribution
The paper presents a new approach that leverages Gaussian splatting for 3D head reconstruction and lip motion synthesis without relying on domain-specific heuristics or 3D supervision.
Findings
Outperforms previous methods in visual quality for talking head generation
Effectively disentangles static 3D structure from lip motion
Operates without 3D supervision or landmarks
Abstract
Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image. Previous 3D talking head generation methods have relied on domain-specific heuristics such as warping-based facial motion representation priors to animate talking motions, yet still produce inaccurate 3D avatar reconstructions, thus undermining the realism of generated animations. We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis. Our approach automatically learns to disentangle a single portrait image into a static 3D reconstruction represented as static Gaussian Splatting, and a predicted whole-image 2D background. It then generates natural lip motion conditioned on input audio, without any motion driven priors. Training is driven purely by 2D reconstruction and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Speech and Audio Processing
