PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Jiatong Shi; Haoran Wang; William Chen; Chenda Li; Wangyou Zhang; Jinchuan Tian; Shinji Watanabe

arXiv:2511.22687·cs.SD·December 1, 2025

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

Jiatong Shi, Haoran Wang, William Chen, Chenda Li, Wangyou Zhang, Jinchuan Tian, Shinji Watanabe

PDF

Open Access

TL;DR

PURE Codec introduces a progressive quantization framework guided by a speech enhancement model, improving stability and performance in low-bitrate neural speech codecs, especially under noisy conditions.

Contribution

It presents a novel multi-stage quantization method that enhances training stability and reconstruction quality in neural speech codecs using residual entropy unfolding.

Findings

01

Outperforms conventional RVQ codecs in reconstruction quality

02

Improves stability of training neural speech codecs

03

Enhances downstream TTS performance under noisy training conditions

Abstract

Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis