Preserving spoken content in voice anonymisation with character-level   vocoder conditioning

Michele Panariello; Massimiliano Todisco; Nicholas Evans

arXiv:2408.04306·eess.AS·August 9, 2024

Preserving spoken content in voice anonymisation with character-level vocoder conditioning

Michele Panariello, Massimiliano Todisco, Nicholas Evans

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel voice anonymisation method that preserves spoken content by conditioning a vocoder on automatic speech recognition outputs, significantly reducing word error rates while maintaining privacy.

Contribution

It is the first to actively preserve spoken content in voice anonymisation using character-level vocoder conditioning with learnable embeddings.

Findings

01

Word error rate decreased by nearly 60%

02

Effective preservation of spoken content with modest anonymisation trade-off

03

First approach to integrate ASR outputs into vocoder conditioning for anonymisation

Abstract

Voice anonymisation can be used to help protect speaker privacy when speech data is shared with untrusted others. In most practical applications, while the voice identity should be sanitised, other attributes such as the spoken content should be preserved. There is always a trade-off; all approaches reported thus far sacrifice spoken content for anonymisation performance. We report what is, to the best of our knowledge, the first attempt to actively preserve spoken content in voice anonymisation. We show how the output of an auxiliary automatic speech recognition model can be used to condition the vocoder module of an anonymisation system using a set of learnable embedding dictionaries in order to preserve spoken content. Relative to a baseline approach, and for only a modest cost in anonymisation performance, the technique is successful in decreasing the word error rate computed from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

m-pana/spk_anon_nac_lm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques

MethodsSparse Evolutionary Training