Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy   Challenge Baseline B1

\"Unal Ege Gaznepoglu; Nils Peters

arXiv:2308.11337·eess.AS·August 23, 2023

Evaluation of the Speech Resynthesis Capabilities of the VoicePrivacy Challenge Baseline B1

\"Unal Ege Gaznepoglu, Nils Peters

PDF

Open Access

TL;DR

This paper evaluates the speech synthesis quality of the VoicePrivacy Challenge Baseline B1, revealing artifacts and unnaturalness in speech output through objective metrics and listening tests, highlighting areas for improvement.

Contribution

It provides an analysis of the reproduction capabilities of the VPC Baseline B1, assessing its effectiveness in synthesizing human-like speech and identifying sources of artifacts.

Findings

01

Artifacts and unnatural perception caused by the speech representation and vocoder

02

Objective metrics indicate reduced speech quality and waveform similarity

03

Listening tests confirm the presence of artifacts and unnaturalness

Abstract

Speaker anonymization systems continue to improve their ability to obfuscate the original speaker characteristics in a speech signal, but often create processing artifacts and unnatural sounding voices as a tradeoff. Many of those systems stem from the VoicePrivacy Challenge (VPC) Baseline B1, using a neural vocoder to synthesize speech from an F0, x-vectors and bottleneck features-based speech representation. Inspired by this, we investigate the reproduction capabilities of the aforementioned baseline, to assess how successful the shared methodology is in synthesizing human-like speech. We use four objective metrics to measure speech quality, waveform similarity, and F0 similarity. Our findings indicate that both the speech representation and the vocoder introduces artifacts, causing an unnatural perception. A MUSHRA-like listening test on 18 subjects corroborate our findings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems