Speaker anonymization using neural audio codec language models

Michele Panariello; Francesco Nespoli; Massimiliano Todisco; Nicholas; Evans

arXiv:2309.14129·eess.AS·January 15, 2024

Speaker anonymization using neural audio codec language models

Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas, Evans

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel speaker anonymization method using neural audio codecs and language models, aiming to better obfuscate speaker identity by leveraging quantized codes that bottleneck speaker information.

Contribution

It proposes a NAC-based speaker anonymization approach that effectively reduces speaker information in speech signals, addressing limitations of previous methods involving speaker embedding perturbation.

Findings

01

NAC-based anonymization reduces speaker identity leakage.

02

The approach achieves high-quality synthetic speech with anonymization.

03

Evaluation shows promising results on Voice Privacy Challenge 2022.

Abstract

The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing