DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Ruibin Yuan; Yuxuan Wu; Jacob Li; Jaxter Kim

arXiv:2209.04530·cs.SD·September 13, 2022

DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

Ruibin Yuan, Yuxuan Wu, Jacob Li, Jaxter Kim

PDF

Open Access 1 Repo

TL;DR

DeID-VC is a novel zero-shot voice conversion system that effectively de-identifies speakers by converting real voices into pseudo speakers, enhancing privacy while maintaining intelligibility and speaker obfuscation.

Contribution

It introduces a VAE-based pseudo speaker generator and novel training objectives for zero-shot voice conversion, advancing speaker de-identification technology.

Findings

01

Improved intelligibility with 10% lower WER

02

Enhanced de-identification with 5% higher EER

03

Effective pseudo speaker assignment at speaker and utterance levels

Abstract

The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice. The key components of DeID-VC include a Variational Autoencoder (VAE) based Pseudo Speaker Generator (PSG) and a voice conversion Autoencoder (AE) under zero-shot settings. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level. Also, two novel learning objectives are added to bridge the gap between training and inference of zero-shot voice conversion. We present our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

a43992899/deid-vc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Hate Speech and Cyberbullying Detection