DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Ruibin Yuan, Yuxuan Wu, Jacob Li, Jaxter Kim

TL;DR
DeID-VC is a novel zero-shot voice conversion system that effectively de-identifies speakers by converting real voices into pseudo speakers, enhancing privacy while maintaining intelligibility and speaker obfuscation.
Contribution
It introduces a VAE-based pseudo speaker generator and novel training objectives for zero-shot voice conversion, advancing speaker de-identification technology.
Findings
Improved intelligibility with 10% lower WER
Enhanced de-identification with 5% higher EER
Effective pseudo speaker assignment at speaker and utterance levels
Abstract
The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice. The key components of DeID-VC include a Variational Autoencoder (VAE) based Pseudo Speaker Generator (PSG) and a voice conversion Autoencoder (AE) under zero-shot settings. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level. Also, two novel learning objectives are added to bridge the gap between training and inference of zero-shot voice conversion. We present our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Hate Speech and Cyberbullying Detection
