Evaluating Identity Leakage in Speaker De-Identification Systems

Seungmin Seo; Oleg Aulov; Afzal Godil; Kevin Mangold

arXiv:2508.14012·cs.SD·August 20, 2025

Evaluating Identity Leakage in Speaker De-Identification Systems

Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold

PDF

TL;DR

This paper introduces a benchmark to measure residual speaker identity leakage in de-identification systems, revealing that current methods still pose significant privacy risks despite efforts to anonymize speech.

Contribution

It proposes a comprehensive benchmark with multiple metrics to evaluate identity leakage, exposing the limitations of existing speaker de-identification systems.

Findings

01

All evaluated systems leak identity information.

02

The best system performs only slightly better than random guessing.

03

The worst system has a 45% hit rate within top 50 candidates.

Abstract

Speaker de-identification aims to conceal a speaker's identity while preserving intelligibility of the underlying speech. We introduce a benchmark that quantifies residual identity leakage with three complementary error rates: equal error rate, cumulative match characteristic hit rate, and embedding-space similarity measured via canonical correlation analysis and Procrustes analysis. Evaluation results reveal that all state-of-the-art speaker de-identification systems leak identity information. The highest performing system in our evaluation performs only slightly better than random guessing, while the lowest performing system achieves a 45% hit rate within the top 50 candidates based on CMC. These findings highlight persistent privacy risks in current speaker de-identification technologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.