OT-Talk: Animating 3D Talking Head with Optimal Transportation

Xinmu Wang; Xiang Gao; Xiyun Song; Heather Yu; Zongfang Lin; Liang Peng; Xianfeng Gu

arXiv:2505.01932·cs.GR·May 13, 2025

OT-Talk: Animating 3D Talking Head with Optimal Transportation

Xinmu Wang, Xiang Gao, Xiyun Song, Heather Yu, Zongfang Lin, Liang Peng, Xianfeng Gu

PDF

Open Access

TL;DR

OT-Talk introduces a novel method using optimal transportation and advanced geometric features to improve the accuracy and naturalness of 3D talking head animations driven by audio signals.

Contribution

This work is the first to apply optimal transportation and Chebyshev Graph Convolution in 3D talking head animation, enhancing mesh modeling and lip-sync accuracy.

Findings

01

Outperforms state-of-the-art methods in quantitative metrics

02

Achieves more natural and coherent facial animations

03

Validated by user perception study with 20 volunteers

Abstract

Animating 3D head meshes using audio inputs has significant applications in AR/VR, gaming, and entertainment through 3D avatars. However, bridging the modality gap between speech signals and facial dynamics remains a challenge, often resulting in incorrect lip syncing and unnatural facial movements. To address this, we propose OT-Talk, the first approach to leverage optimal transportation to optimize the learning model in talking head animation. Building on existing learning frameworks, we utilize a pre-trained Hubert model to extract audio features and a transformer model to process temporal sequences. Unlike previous methods that focus solely on vertex coordinates or displacements, we introduce Chebyshev Graph Convolution to extract geometric features from triangulated meshes. To measure mesh dissimilarities, we go beyond traditional mesh reconstruction errors and velocity differences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsFocus · Convolution