How do Voices from Past Speech Synthesis Challenges Compare Today?

Erica Cooper; Junichi Yamagishi

arXiv:2105.02373·cs.SD·July 1, 2021·1 cites

How do Voices from Past Speech Synthesis Challenges Compare Today?

Erica Cooper, Junichi Yamagishi

PDF

Open Access

TL;DR

This paper revisits past speech synthesis challenges, conducts a large-scale listening test on combined samples, and analyzes how opinions and quality perceptions have evolved over time.

Contribution

It provides a comprehensive comparison of past speech synthesis systems and insights into how perceptions of quality change across different challenges and speakers.

Findings

01

Strong correlation between original and new test results at system level

02

Speaker choice significantly impacts synthesis quality

03

Historical challenge data is valuable for ongoing research

Abstract

Shared challenges provide a venue for comparing systems trained on common data using a standardized evaluation, and they also provide an invaluable resource for researchers when the data and evaluation results are publicly released. The Blizzard Challenge and Voice Conversion Challenge are two such challenges for text-to-speech synthesis and for speaker conversion, respectively, and their publicly-available system samples and listening test results comprise a historical record of state-of-the-art synthesis methods over the years. In this paper, we revisit these past challenges and conduct a large-scale listening test with samples from many challenges combined. Our aims are to analyze and compare opinions of a large number of systems together, to determine whether and how opinions change over time, and to collect a large-scale dataset of a diverse variety of synthetic samples and their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Topic Modeling