Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

Tong Shi; Melonie de Almeida; Daniela Ivanova; Nicolas Pugeault; Paul Henderson

arXiv:2601.18633·cs.CV·January 27, 2026

Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

Tong Shi, Melonie de Almeida, Daniela Ivanova, Nicolas Pugeault, Paul Henderson

PDF

Open Access

TL;DR

Splat-Portrait introduces a novel Gaussian-splatting-based method for talking head generation that automatically disentangles static 3D structure and lip motion from a single image and audio, achieving superior visual quality without 3D supervision.

Contribution

The paper presents a new approach that leverages Gaussian splatting for 3D head reconstruction and lip motion synthesis without relying on domain-specific heuristics or 3D supervision.

Findings

01

Outperforms previous methods in visual quality for talking head generation

02

Effectively disentangles static 3D structure from lip motion

03

Operates without 3D supervision or landmarks

Abstract

Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image. Previous 3D talking head generation methods have relied on domain-specific heuristics such as warping-based facial motion representation priors to animate talking motions, yet still produce inaccurate 3D avatar reconstructions, thus undermining the realism of generated animations. We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis. Our approach automatically learns to disentangle a single portrait image into a static 3D reconstruction represented as static Gaussian Splatting, and a predicted whole-image 2D background. It then generates natural lip motion conditioned on input audio, without any motion driven priors. Training is driven purely by 2D reconstruction and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Speech and Audio Processing