Vocoder drift compensation by x-vector alignment in speaker   anonymisation

Michele Panariello; Massimiliano Todisco; Nicholas Evans

arXiv:2307.08403·eess.AS·July 18, 2023·1 cites

Vocoder drift compensation by x-vector alignment in speaker anonymisation

Michele Panariello, Massimiliano Todisco, Nicholas Evans

PDF

Open Access

TL;DR

This paper investigates vocoder drift in x-vector-based speaker anonymisation, identifies its cause as a mismatch in speech content and prosody, and proposes a compensation method to improve control and anonymisation quality.

Contribution

It introduces a novel approach to compensate for vocoder drift by aligning x-vectors, enhancing control over anonymisation and reducing drift effects.

Findings

01

Vocoder drift is caused by mismatch between x-vector and speech content.

02

Compensation significantly reduces vocoder drift.

03

Improved control over anonymisation process.

Abstract

For the most popular x-vector-based approaches to speaker anonymisation, the bulk of the anonymisation can stem from vocoding rather than from the core anonymisation function which is used to substitute an original speaker x-vector with that of a fictitious pseudo-speaker. This phenomenon can impede the design of better anonymisation systems since there is a lack of fine-grained control over the x-vector space. The work reported in this paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation. While anonymisation performance degrades as expected, compensation reduces vocoder drift substantially, offers improved control over the x-vector space and lays a foundation for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems