HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
Zhisheng Zhang, Pengyang Huang

TL;DR
HiddenSpeaker introduces a method to embed imperceptible perturbations in speech data, making it unlearnable for speaker verification systems while maintaining human perceptual quality, thus protecting privacy and preventing unauthorized model training.
Contribution
The paper proposes HiddenSpeaker, a novel framework that generates imperceptible, unlearnable audio samples using a simplified error-minimizing method and a hybrid perceptual optimization, enhancing privacy protection.
Findings
Successfully deceives state-of-the-art speaker verification models
Perturbations are highly imperceptible to human listeners
Demonstrates strong transferability across different models
Abstract
In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
