Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation

Hieu-Thi Luong; Inbal Rimon; Haim Permuter; Kong Aik Lee; Eng Siong Chng

arXiv:2507.03468·cs.SD·September 1, 2025

Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation

Hieu-Thi Luong, Inbal Rimon, Haim Permuter, Kong Aik Lee, Eng Siong Chng

PDF

Open Access

TL;DR

This paper critically evaluates partial audio deepfake localization, highlighting the limitations of current metrics like EER, and proposes a more comprehensive evaluation framework emphasizing out-of-domain performance and real-world applicability.

Contribution

It reframes the localization task as sequential anomaly detection and advocates for threshold-dependent metrics, revealing the poor generalization of current models to out-of-domain data.

Findings

01

Current models perform well in-domain but poorly out-of-domain.

02

Over-optimizing for in-domain EER can harm real-world performance.

03

Adding partially fake data to training improves detection accuracy.

Abstract

Partial audio deepfake localization poses unique challenges and remain underexplored compared to full-utterance spoofing detection. While recent methods report strong in-domain performance, their real-world utility remains unclear. In this analysis, we critically examine the limitations of current evaluation practices, particularly the widespread use of Equal Error Rate (EER), which often obscures generalization and deployment readiness. We propose reframing the localization task as a sequential anomaly detection problem and advocate for the use of threshold-dependent metrics such as accuracy, precision, recall, and F1-score, which better reflect real-world behavior. Specifically, we analyze the performance of the open-source Coarse-to-Fine Proposal Refinement Framework (CFPRF), which achieves a 20-ms EER of 7.61% on the in-domain PartialSpoof evaluation set, but 43.25% and 27.59% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Topic Modeling