Continuous Speech Tokenizer in Text To Speech

Yixing Li; Ruobing Xie; Xingwu Sun; Yu Cheng; Zhanhui Kang

arXiv:2410.17081·cs.SD·April 1, 2025

Continuous Speech Tokenizer in Text To Speech

Yixing Li, Ruobing Xie, Xingwu Sun, Yu Cheng, Zhanhui Kang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Cont-SPT, a continuous speech tokenizer for text-to-speech systems that preserves more information than discrete tokenizers, leading to improved speech quality and continuity.

Contribution

The paper proposes a novel continuous speech tokenizer, Cont-SPT, which reduces information loss in speech representation for TTS applications.

Findings

01

Cont-SPT achieves higher estimated MoS scores.

02

Cont-SPT preserves more information across frequency spectrum.

03

The approach improves speech continuity in TTS models.

Abstract

The fusion of speech and language in the era of large language models has garnered significant attention. Discrete speech token is often utilized in text-to-speech tasks for speech compression and portability, which is convenient for joint training with text and have good compression efficiency. However, we found that the discrete speech tokenizer still suffers from information loss. Therefore, we propose a simple yet effective continuous speech tokenizer named Cont-SPT, and a text-to-speech model based on continuous speech tokens. Our results show that the speech language model based on the continuous speech tokenizer has better continuity and higher estimated Mean Opinion Scores (MoS). This enhancement is attributed to better information preservation rate of the continuous speech tokenizer across both low and high frequencies in the frequency domain. The code and resources for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yixing-li/continuous-speech-tokenizer
noneOfficial

Videos

Continuous Speech Tokenizer in Text To Speech· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing