Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples
Meng Chen, Li Lu, Jiadi Yu, Yingying Chen, Zhongjie Ba, Feng Lin, Kui, Ren

TL;DR
This paper introduces a novel voice de-identification system using adversarial examples that balance privacy and utility, preserving voice quality while preventing automatic speaker identification, with diverse target embedding capabilities.
Contribution
The study proposes a convolutional adversarial example approach modulating room impulse responses and a conditional variational auto-encoder for adaptive, diverse voice de-identification.
Findings
Achieves 98% successful de-identification on mainstream ASIs.
Maintains high voice perceptual quality with a mean opinion score of 4.48.
Demonstrates effective privacy-utility trade-off in voice data publishing.
Abstract
Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying convenient voice services. Existing studies employ direct modification or text-based re-synthesis to de-identify users' voices, but resulting in inconsistent audibility in the presence of human participants. In this paper, we propose a voice de-identification system, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefit from this, our system could preserve user identity from exposure by Automatic Speaker Identification (ASI) while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, our system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
