GenSync: A Generalized Talking Head Framework for Audio-driven   Multi-Subject Lip-Sync using 3D Gaussian Splatting

Anushka Agarwal; Muhammad Yusuf Hassan; Talha Chafekar

arXiv:2505.01928·cs.CV·May 6, 2025

GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting

Anushka Agarwal, Muhammad Yusuf Hassan, Talha Chafekar

PDF

Open Access

TL;DR

GenSync is a unified framework that synthesizes lip-synced videos for multiple speakers using 3D Gaussian Splatting, with a disentanglement module for identity and audio separation, achieving faster training and high quality.

Contribution

It introduces a multi-identity lip-sync framework with a disentanglement module, reducing training time and maintaining high visual and lip-sync quality.

Findings

01

Achieves 6.8x faster training than state-of-the-art models.

02

Maintains high lip-sync accuracy and visual quality across multiple identities.

03

Uses 3D Gaussian Splatting for efficient multi-subject video synthesis.

Abstract

We introduce GenSync, a novel framework for multi-identity lip-synced video synthesis using 3D Gaussian Splatting. Unlike most existing 3D methods that require training a new model for each identity , GenSync learns a unified network that synthesizes lip-synced videos for multiple speakers. By incorporating a Disentanglement Module, our approach separates identity-specific features from audio representations, enabling efficient multi-identity video synthesis. This design reduces computational overhead and achieves 6.8x faster training compared to state-of-the-art models, while maintaining high lip-sync accuracy and visual quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music Technology and Sound Studies · Music and Audio Processing