The Codec Language Model-based Zero-Shot Spontaneous Style TTS System   for CoVoC Challenge 2024

Shuoyi Zhou; Yixuan Zhou; Weiqin Li; Jun Chen; Runchuan Ye; Weihao Wu,; Zijian Lin; Shun Lei; Zhiyong Wu

arXiv:2412.01100·cs.SD·February 5, 2025

The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024

Shuoyi Zhou, Yixuan Zhou, Weiqin Li, Jun Chen, Runchuan Ye, Weihao Wu,, Zijian Lin, Shun Lei, Zhiyong Wu

PDF

Open Access

TL;DR

This paper presents a zero-shot spontaneous style TTS system using a LLaMA-based codec language model with guidance strategies, achieving top naturalness and quality in the CoVoC 2024 challenge.

Contribution

It introduces a novel LLaMA-based codec language model with delay pattern and classifier-free guidance for spontaneous style voice cloning in zero-shot TTS.

Findings

01

Achieved a MOS of 3.80 for speech naturalness

02

Outperformed in speech quality and speaker similarity

03

Effective data preprocessing improved utterance quality

Abstract

This paper describes the zero-shot spontaneous style TTS system for the ISCSLP 2024 Conversational Voice Clone Challenge (CoVoC). We propose a LLaMA-based codec language model with a delay pattern to achieve spontaneous style voice cloning. To improve speech intelligibility, we introduce the Classifier-Free Guidance (CFG) strategy in the language model to strengthen conditional guidance on token prediction. To generate high-quality utterances, we adopt effective data preprocessing operations and fine-tune our model with selected high-quality spontaneous speech data. The official evaluations in the CoVoC constrained track show that our system achieves the best speech naturalness MOS of 3.80 and obtains considerable speech quality and speaker similarity results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems