Speaker anonymization using neural audio codec language models
Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas, Evans

TL;DR
This paper introduces a novel speaker anonymization method using neural audio codecs and language models, aiming to better obfuscate speaker identity by leveraging quantized codes that bottleneck speaker information.
Contribution
It proposes a NAC-based speaker anonymization approach that effectively reduces speaker information in speech signals, addressing limitations of previous methods involving speaker embedding perturbation.
Findings
NAC-based anonymization reduces speaker identity leakage.
The approach achieves high-quality synthetic speech with anonymization.
Evaluation shows promising results on Voice Privacy Challenge 2022.
Abstract
The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
