Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

Yunkee Chae; Kyogu Lee

arXiv:2506.16538·cs.SD·June 23, 2025

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

Yunkee Chae, Kyogu Lee

PDF

Open Access

TL;DR

This paper proposes a Variable Bitrate Residual Vector Quantization framework that enhances noise robustness and compression efficiency in neural speech coding by dynamically allocating bits and incorporating a denoiser.

Contribution

It introduces a novel VRVQ method that adapts bitrate per frame and integrates denoising, improving noise robustness and rate-distortion performance over traditional RVQ.

Findings

01

VRVQ outperforms conventional RVQ in noisy conditions.

02

Dynamic bitrate allocation improves compression efficiency.

03

Incorporating denoising enhances perceptual quality.

Abstract

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades compression efficiency. Standard codecs allocate bits uniformly, wasting bitrate on noise components that do not contribute to intelligibility. This paper introduces a Variable Bitrate RVQ (VRVQ) framework for noise-robust speech coding, dynamically adjusting bitrate per frame to optimize rate-distortion trade-offs. Unlike constant bitrate (CBR) RVQ, our method prioritizes critical speech components while suppressing residual noise. Additionally, we integrate a feature denoiser to further improve noise robustness. Experimental results show that VRVQ improves rate-distortion trade-offs over conventional methods, achieving better compression efficiency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis