The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks,   Results and Findings

Kangxiang Xia; Dake Guo; Jixun Yao; Liumeng Xue; Hanzhao Li; Shuai; Wang; Zhao Guo; Lei Xie; Qingqing Zhang; Lei Luo; Minghui Dong; Peng Sun

arXiv:2411.00064·cs.SD·November 4, 2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings

Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai, Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun

PDF

Open Access

TL;DR

The ISCSLP 2024 CoVoC Challenge benchmarks zero-shot spontaneous style voice cloning, emphasizing spontaneous conversational speech generation with unconstrained and constrained tracks, supported by a new high-quality dataset.

Contribution

This paper introduces the CoVoC Challenge with two tracks, provides a new high-quality dataset, and reports evaluation results and insights on spontaneous voice cloning.

Findings

01

Unconstrained models outperform constrained ones in spontaneous speech quality.

02

High-quality dataset improves zero-shot voice cloning performance.

03

Evaluation reveals key challenges in spontaneous style voice synthesis.

Abstract

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge aims to benchmark and advance zero-shot spontaneous style voice cloning, particularly focusing on generating spontaneous behaviors in conversational speech. The challenge comprises two tracks: an unconstrained track without limitation on data and model usage, and a constrained track only allowing the use of constrained open-source datasets. A 100-hour high-quality conversational speech dataset is also made available with the challenge. This paper details the data, tracks, submitted systems, evaluation results, and findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions