HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker   Verification System

Zhisheng Zhang; Pengyang Huang

arXiv:2405.15655·cs.SD·September 13, 2024

HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

Zhisheng Zhang, Pengyang Huang

PDF

Open Access

TL;DR

HiddenSpeaker introduces a method to embed imperceptible perturbations in speech data, making it unlearnable for speaker verification systems while maintaining human perceptual quality, thus protecting privacy and preventing unauthorized model training.

Contribution

The paper proposes HiddenSpeaker, a novel framework that generates imperceptible, unlearnable audio samples using a simplified error-minimizing method and a hybrid perceptual optimization, enhancing privacy protection.

Findings

01

Successfully deceives state-of-the-art speaker verification models

02

Perturbations are highly imperceptible to human listeners

03

Demonstrates strong transferability across different models

Abstract

In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing