Speaker Anonymization with Phonetic Intermediate Representations

Sarina Meyer; Florian Lux; Pavel Denisov; Julia Koch; Pascal Tilli,; Ngoc Thang Vu

arXiv:2207.04834·cs.SD·July 12, 2022

Speaker Anonymization with Phonetic Intermediate Representations

Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli,, Ngoc Thang Vu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a speaker anonymization method using phonetic intermediate representations, leveraging speech recognition and synthesis to effectively hide speaker identity while preserving speech content, and demonstrates robustness and improved privacy in experiments.

Contribution

The work presents a novel speaker anonymization pipeline that uses phonetic transcriptions and anonymized embeddings, improving privacy robustness and speech quality over existing methods.

Findings

01

System handles imperfect ASR transcriptions effectively.

02

Combining speaker embeddings from multiple sources improves anonymization.

03

Outperforms Voice Privacy Challenge 2020 baselines in privacy and speech quality.

Abstract

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic content as much as possible. Our experimental results on LibriSpeech and VCTK corpora reveal two key findings: 1) although automatic speech recognition produces imperfect transcriptions, our neural speech synthesis system can handle such errors, making our system feasible and robust, and 2) combining speaker embeddings from different resources is beneficial and their appropriate normalization is crucial. Overall, our final best system outperforms significantly the baselines provided in the Voice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

digitalphonetics/speaker-anonymization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Hate Speech and Cyberbullying Detection