On the Effectiveness of Acoustic BPE in Decoder-Only TTS

Bohan Li; Feiyu Shen; Yiwei Guo; Shuai Wang; Xie Chen; Kai Yu

arXiv:2407.03892·cs.SD·October 30, 2024

On the Effectiveness of Acoustic BPE in Decoder-Only TTS

Bohan Li, Feiyu Shen, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

PDF

Open Access

TL;DR

This paper investigates the use of acoustic byte-pair encoding (BPE) in decoder-only text-to-speech models, demonstrating its benefits in improving speech intelligibility and diversity through comprehensive experiments.

Contribution

It provides the first thorough analysis of acoustic BPE's impact on decoder-only TTS, clarifying optimal settings and its advantages over other methods.

Findings

01

Acoustic BPE improves speech intelligibility and diversity.

02

Different BPE settings influence speech features.

03

Acoustic BPE is a beneficial tool for decoder-only TTS.

Abstract

Discretizing speech into tokens and generating them by a decoder-only model have been a promising direction for text-to-speech (TTS) and spoken language modeling (SLM). To shorten the sequence length of speech tokens, acoustic byte-pair encoding (BPE) has emerged in SLM that treats speech tokens from self-supervised semantic representations as characters to further compress the token sequence. But the gain in TTS has not been fully investigated, and the proper choice of acoustic BPE remains unclear. In this work, we conduct a comprehensive study on various settings of acoustic BPE to explore its effectiveness in decoder-only TTS models with semantic speech tokens. Experiments on LibriTTS verify that acoustic BPE uniformly increases the intelligibility and diversity of synthesized speech, while showing different features across BPE settings. Hence, acoustic BPE is a favorable tool for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Network Technologies · Acoustic Wave Resonator Technologies · Blind Source Separation Techniques

MethodsByte Pair Encoding