Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation

Dogucan Yaman; Fevziye Irem Eyiokur; Leonard B\"armann; Haz{\i}m Kemal Ekenel; Alexander Waibel

arXiv:2507.20953·cs.CV·July 29, 2025

Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation

Dogucan Yaman, Fevziye Irem Eyiokur, Leonard B\"armann, Haz{\i}m Kemal Ekenel, Alexander Waibel

PDF

TL;DR

This paper introduces a mask-free method for audio-driven talking face generation that enhances visual quality and preserves identity without needing masked inputs or reference images, by transforming input faces to have closed mouths before lip adaptation.

Contribution

The proposed approach eliminates the need for masked input images and identity references, improving identity preservation and visual quality in talking face generation.

Findings

01

Outperforms state-of-the-art methods on LRS2 and HDTF datasets.

02

Maintains high visual quality and accurate lip synchronization.

03

Reduces information loss and identity mismatch issues.

Abstract

Audio-Driven Talking Face Generation aims at generating realistic videos of talking faces, focusing on accurate audio-lip synchronization without deteriorating any identity-related visual details. Recent state-of-the-art methods are based on inpainting, meaning that the lower half of the input face is masked, and the model fills the masked region by generating lips aligned with the given audio. Hence, to preserve identity-related visual details from the lower half, these approaches additionally require an unmasked identity reference image randomly selected from the same video. However, this common masking strategy suffers from (1) information loss in the input faces, significantly affecting the networks' ability to preserve visual quality and identity details, (2) variation between identity reference and input image degrading reconstruction performance, and (3) the identity reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.