PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment

Bin Wang; Yang Xu; Huan Zhao; Hao Zhang; Zixing Zhang

arXiv:2512.22602·cs.CV·December 30, 2025

PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment

Bin Wang, Yang Xu, Huan Zhao, Hao Zhang, Zixing Zhang

PDF

Open Access

TL;DR

PTalker is a novel framework for personalized 3D talking head animation that disentangles speaking style from speech and aligns audio and facial motion modalities to produce realistic, style-preserving animations.

Contribution

It introduces a style disentanglement approach and a three-level modality alignment mechanism to enhance personalization and lip-sync accuracy in 3D talking head generation.

Findings

01

Outperforms state-of-the-art methods in realism and style preservation

02

Achieves high lip-synchronization accuracy through multi-level alignment

03

Generates personalized 3D talking heads matching individual speaking styles

Abstract

Speech-driven 3D talking head generation aims to produce lifelike facial animations precisely synchronized with speech. While considerable progress has been made in achieving high lip-synchronization accuracy, existing methods largely overlook the intricate nuances of individual speaking styles, which limits personalization and realism. In this work, we present a novel framework for personalized 3D talking head animation, namely "PTalker". This framework preserves speaking style through style disentanglement from audio and facial motion sequences and enhances lip-synchronization accuracy through a three-level alignment mechanism between audio and mesh modalities. Specifically, to effectively disentangle style and content, we design disentanglement constraints that encode driven audio and motion sequences into distinct style and content spaces to enhance speaking style representation. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing