EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

Arpita Saggar; Jonathan C. Darling; Duygu Sarikaya; David C. Hogg

arXiv:2603.07604·cs.CV·March 10, 2026

EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

Arpita Saggar, Jonathan C. Darling, Duygu Sarikaya, David C. Hogg

PDF

Open Access

TL;DR

EmbedTalk introduces embedding-driven Gaussian deformation for real-time talking head synthesis, surpassing traditional tri-plane methods in quality, efficiency, and compactness, enabling high FPS on mobile GPUs.

Contribution

The paper presents EmbedTalk, a novel embedding-based approach for speech-driven facial deformation that improves over tri-plane representations in quality and computational efficiency.

Findings

01

Outperforms existing 3DGS methods in rendering quality and lip sync.

02

Achieves over 60 FPS on a mobile GPU with a compact model.

03

Replaces tri-plane encoding with learned embeddings for better efficiency.

Abstract

Real-time talking head synthesis increasingly relies on deformable 3D Gaussian Splatting (3DGS) due to its low latency. Tri-planes are the standard choice for encoding Gaussians prior to deformation, since they provide a continuous domain with explicit spatial relationships. However, tri-plane representations are limited by grid resolution and approximation errors introduced by projecting 3D volumetric fields onto 2D subspaces. Recent work has shown the superiority of learnt embeddings for driving temporal deformations in 4D scene reconstruction. We introduce $EmbedTalk$ , which shows how such embeddings can be leveraged for modelling speech deformations in talking head synthesis. Through comprehensive experiments, we show that EmbedTalk outperforms existing 3DGS-based methods in rendering quality, lip synchronisation, and motion consistency, while remaining competitive with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing