Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework

Dogucan Yaman; Fevziye Irem Eyiokur; Haz{\i}m Kemal Ekenel; Alexander Waibel

arXiv:2511.08613·cs.CV·February 11, 2026

Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework

Dogucan Yaman, Fevziye Irem Eyiokur, Haz{\i}m Kemal Ekenel, Alexander Waibel

PDF

Open Access

TL;DR

This paper presents a comprehensive evaluation framework for detecting and quantifying lip leakage in talking face generation models, addressing limitations of standard metrics and test setups.

Contribution

It introduces a systematic, model-agnostic evaluation methodology with new metrics and test setups to better assess identity leakage in talking face synthesis.

Findings

01

The framework effectively detects lip leakage across different models.

02

Derived metrics provide quantitative measures of lip-sync fidelity.

03

Insights into reference selection impact on leakage are provided.

Abstract

Video editing-based talking face generation aims to preserve video details such as pose, lighting, and gestures while modifying only lip motion, often using an identity reference image to maintain speaker consistency. However, this mechanism can introduce lip leakage, where generated lips are influenced by the reference image rather than solely by the driving audio. Such leakage is difficult to detect with standard metrics and conventional test setup. To address this, we propose a systematic evaluation methodology to analyze and quantify lip leakage. Our framework employs three complementary test setups: silent-input generation, mismatched audio-video pairing, and matched audio-video synthesis. We also introduce derived metrics including lip-sync discrepancy and silent-audio-based lip-sync scores. In addition, we study how different identity reference selections affect leakage,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis