Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars

NVIDIA: Chaeyeon Chung; Ilya Fedorov; Michael Huang; Aleksey Karmanov; Dmitry Korobchenko; Roger Ribera; Yeongho Seol

arXiv:2508.16401·cs.GR·August 25, 2025

Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars

NVIDIA: Chaeyeon Chung, Ilya Fedorov, Michael Huang, Aleksey Karmanov, Dmitry Korobchenko, Roger Ribera, Yeongho Seol

PDF

6 Models

TL;DR

Audio2Face-3D is a real-time, open-source system that generates realistic facial animations for digital avatars driven by audio input, enhancing interactive experiences and avatar creation.

Contribution

This paper introduces NVIDIA Audio2Face-3D, a comprehensive system with open-source tools for audio-driven facial animation of digital avatars, including data, architecture, and evaluation methods.

Findings

01

Enables real-time facial animation for avatars

02

Provides open-source SDK and training framework

03

Facilitates realistic avatar interaction

Abstract

Audio-driven facial animation presents an effective solution for animating digital avatars. In this paper, we detail the technical aspects of NVIDIA Audio2Face-3D, including data acquisition, network architecture, retargeting methodology, evaluation metrics, and use cases. Audio2Face-3D system enables real-time interaction between human users and interactive avatars, facilitating facial animation authoring for game characters. To assist digital avatar creators and game developers in generating realistic facial animations, we have open-sourced Audio2Face-3D networks, SDK, training framework, and example dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.