The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings
Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai, Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun

TL;DR
The ISCSLP 2024 CoVoC Challenge benchmarks zero-shot spontaneous style voice cloning, emphasizing spontaneous conversational speech generation with unconstrained and constrained tracks, supported by a new high-quality dataset.
Contribution
This paper introduces the CoVoC Challenge with two tracks, provides a new high-quality dataset, and reports evaluation results and insights on spontaneous voice cloning.
Findings
Unconstrained models outperform constrained ones in spontaneous speech quality.
High-quality dataset improves zero-shot voice cloning performance.
Evaluation reveals key challenges in spontaneous style voice synthesis.
Abstract
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge aims to benchmark and advance zero-shot spontaneous style voice cloning, particularly focusing on generating spontaneous behaviors in conversational speech. The challenge comprises two tracks: an unconstrained track without limitation on data and model usage, and a constrained track only allowing the use of constrained open-source datasets. A 100-hour high-quality conversational speech dataset is also made available with the challenge. This paper details the data, tracks, submitted systems, evaluation results, and findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions
