Toward Fine-Grained Facial Control in 3D Talking Head Generation

Shaoyang Xie; Xiaofeng Cong; Baosheng Yu; Zhipeng Gui; Jie Gui; Yuan Yan Tang; James Tin-Yau Kwok

arXiv:2602.09736·cs.CV·February 11, 2026

Toward Fine-Grained Facial Control in 3D Talking Head Generation

Shaoyang Xie, Xiaofeng Cong, Baosheng Yu, Zhipeng Gui, Jie Gui, Yuan Yan Tang, James Tin-Yau Kwok

PDF

Open Access

TL;DR

This paper introduces FG-3DGS, a novel framework for fine-grained, high-fidelity 3D talking head generation that improves lip synchronization and facial detail control using frequency-aware modeling and post-rendering refinement.

Contribution

The paper proposes a frequency-aware disentanglement strategy and a high-frequency refinement mechanism to enhance control and realism in 3D talking head generation.

Findings

01

Outperforms state-of-the-art methods in lip synchronization accuracy.

02

Achieves high-fidelity facial detail and temporal consistency.

03

Demonstrates robustness across multiple datasets.

Abstract

Audio-driven talking head generation is a core component of digital avatars, and 3D Gaussian Splatting has shown strong performance in real-time rendering of high-fidelity talking heads. However, achieving precise control over fine-grained facial movements remains a significant challenge, particularly due to lip-synchronization inaccuracies and facial jitter, both of which can contribute to the uncanny valley effect. To address these challenges, we propose Fine-Grained 3D Gaussian Splatting (FG-3DGS), a novel framework that enables temporally consistent and high-fidelity talking head generation. Our method introduces a frequency-aware disentanglement strategy to explicitly model facial regions based on their motion characteristics. Low-frequency regions, such as the cheeks, nose, and forehead, are jointly modeled using a standard MLP, while high-frequency regions, including the eyes and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing