Voice Conversion Challenge 2020: Intra-lingual semi-parallel and   cross-lingual voice conversion

Yi Zhao; Wen-Chin Huang; Xiaohai Tian; Junichi Yamagishi; Rohan Kumar; Das; Tomi Kinnunen; Zhenhua Ling; Tomoki Toda

arXiv:2008.12527·eess.AS·August 31, 2020·34 cites

Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar, Das, Tomi Kinnunen, Zhenhua Ling, Tomoki Toda

PDF

Open Access

TL;DR

The Voice Conversion Challenge 2020 evaluated the progress of VC systems using new datasets for intra-lingual and cross-lingual tasks, showing rapid improvements but still gaps compared to human naturalness, especially in cross-lingual conversion.

Contribution

This paper presents the third edition of the VC challenge with new datasets and comprehensive evaluation results, highlighting advancements and remaining challenges in VC technology.

Findings

01

Speaker similarity scores reached target levels in intra-lingual VC.

02

Naturalness remains below human levels for intra-lingual VC.

03

Cross-lingual VC shows promising results with MOS scores above 4.0.

Abstract

The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, including 3 baselines built on the database. From the results of crowd-sourced listening tests, we observed that VC methods have progressed rapidly thanks to advanced deep learning methods. In particular, speaker similarity scores of several systems turned out to be as high as target speakers in the intra-lingual semi-parallel VC task. However, we confirmed that none of them have achieved human-level naturalness yet for the same task. The cross-lingual conversion task is, as expected, a more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Voice and Speech Disorders