3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

Zhongju Wang; Zhenhong Sun; Beier Wang; Yifu Wang; Daoyi Dong; Huadong Mo; Hongdong Li

arXiv:2602.10516·cs.CV·April 6, 2026

3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

Zhongju Wang, Zhenhong Sun, Beier Wang, Yifu Wang, Daoyi Dong, Huadong Mo, Hongdong Li

PDF

1 Datasets

TL;DR

3DXTalker is a comprehensive framework for generating expressive 3D talking avatars that unify identity, lip sync, emotion, and spatial head movements using data-curated modeling and advanced neural techniques.

Contribution

It introduces a scalable identity modeling pipeline, rich audio and emotional cues, and a flow-matching transformer for coherent facial and head dynamics, advancing 3D avatar expressivity.

Findings

01

Achieves superior lip synchronization and nuanced expressions.

02

Enables natural head-pose motion with stylized control.

03

Outperforms existing methods in 3D talking avatar generation.

Abstract

Audio-driven 3D talking avatar generation is increasingly important in virtual communication, digital humans, and interactive media, where avatars must preserve identity, synchronize lip motion with speech, express emotion, and exhibit lifelike spatial dynamics, collectively defining a broader objective of expressivity. However, achieving this remains challenging due to insufficient training data with limited subject identities, narrow audio representations, and restricted explicit controllability. In this paper, we propose 3DXTalker, an expressive 3D talking avatar through data-curated identity modeling, audio-rich representations, and spatial dynamics controllability. 3DXTalker enables scalable identity modeling via 2D-to-3D data curation pipeline and disentangled representations, alleviating data scarcity and improving identity generalization. Then, we introduce frame-wise amplitude…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

EngineeringAI-LAB/3DTalkingDataset
dataset· 266 dl
266 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.