Attacking Voice Anonymization Systems with Augmented Feature and Speaker   Identity Difference

Yanzhe Zhang; Zhonghao Bi; Feiyang Xiao; Xuefeng Yang; Qiaoxi Zhu,; Jian Guan

arXiv:2412.19068·eess.AS·January 14, 2025

Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference

Yanzhe Zhang, Zhonghao Bi, Feiyang Xiao, Xuefeng Yang, Qiaoxi Zhu,, Jian Guan

PDF

Open Access

TL;DR

This paper presents DA-SID, an attacker system that combines data augmentation and speaker identity difference techniques to effectively break voice anonymization, achieving top performance in the ICASSP 2025 challenge.

Contribution

It introduces a novel attacker framework that integrates data augmentation and PLDA-based speaker difference enhancement for voice anonymization attacks.

Findings

01

Outperforms baseline in speaker verification accuracy

02

Achieves top-5 ranking in ICASSP 2025 challenge

03

Demonstrates robustness against various anonymization methods

Abstract

This study focuses on the First VoicePrivacy Attacker Challenge within the ICASSP 2025 Signal Processing Grand Challenge, which aims to develop speaker verification systems capable of determining whether two anonymized speech signals are from the same speaker. However, differences between feature distributions of original and anonymized speech complicate this task. To address this challenge, we propose an attacker system that combines Data Augmentation enhanced feature representation and Speaker Identity Difference enhanced classifier to improve verification performance, termed DA-SID. Specifically, data augmentation strategies (i.e., data fusion and SpecAugment) are utilized to mitigate feature distribution gaps, while probabilistic linear discriminant analysis (PLDA) is employed to further enhance speaker identity difference. Our system significantly outperforms the baseline,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis