BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

Masaya Kawamura; Takuya Hasumi; Yuma Shirahata; Ryuichi Yamamoto

arXiv:2506.03515·eess.AS·June 5, 2025

BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

Masaya Kawamura, Takuya Hasumi, Yuma Shirahata, Ryuichi Yamamoto

PDF

Open Access

TL;DR

This paper introduces a highly compact TTS model using ultra-low 1.58-bit quantization and weight indexing, achieving significant size reduction while maintaining high synthesis quality for on-device applications.

Contribution

The paper presents a novel combination of quantization-aware training and weight indexing to drastically reduce TTS model size without sacrificing quality.

Findings

01

Achieved 83% reduction in model size.

02

Outperformed baseline models in synthesis quality.

03

Demonstrated effectiveness on on-device TTS applications.

Abstract

This paper proposes a highly compact, lightweight text-to-speech (TTS) model for on-device applications. To reduce the model size, the proposed model introduces two techniques. First, we introduce quantization-aware training (QAT), which quantizes model parameters during training to as low as 1.58-bit. In this case, most of 32-bit model parameters are quantized to ternary values {-1, 0, 1}. Second, we propose a method named weight indexing. In this method, we save a group of 1.58-bit weights as a single int8 index. This allows for efficient storage of model parameters, even on hardware that treats values in units of 8-bit. Experimental results demonstrate that the proposed method achieved 83 % reduction in model size, while outperforming the baseline of similar model size without quantization in synthesis quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Embedded Systems Design Techniques · Advanced Data Compression Techniques