GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable   Gaussian Splatting

Bo Chen; Shoukang Hu; Qi Chen; Chenpeng Du; Ran Yi; Yanmin Qian; Xie; Chen

arXiv:2404.19040·cs.CV·May 1, 2024

GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie, Chen

PDF

Open Access

TL;DR

GSTalker is a novel 3D audio-driven talking face generation model that achieves fast training and real-time rendering by using Gaussian Splatting and deformation fields to synchronize facial movements with audio.

Contribution

The paper introduces GSTalker, which employs Gaussian Splatting and deformation fields for efficient, high-fidelity, audio-synchronized 3D talking face generation with significantly reduced training and rendering times.

Findings

01

Fast training within 40 minutes

02

Real-time rendering at 125 FPS

03

High-fidelity, audio-synchronized face generation

Abstract

We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3 $\sim$ 5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Gaussian deformation field to translate and transform 3D Gaussians to synchronize with audio information, in which multi-resolution hashing grid-based tri-plane and temporal smooth module are incorporated to learn accurate deformation for fine-grained facial details. In addition, a pose-conditioned deformation field is designed to model the stabilized torso. To enable efficient optimization of the condition Gaussian deformation field, we initialize 3D Gaussians by learning a coarse static…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Video Surveillance and Tracking Methods