Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Yunkee Chae, Kyogu Lee

TL;DR
This paper proposes a Variable Bitrate Residual Vector Quantization framework that enhances noise robustness and compression efficiency in neural speech coding by dynamically allocating bits and incorporating a denoiser.
Contribution
It introduces a novel VRVQ method that adapts bitrate per frame and integrates denoising, improving noise robustness and rate-distortion performance over traditional RVQ.
Findings
VRVQ outperforms conventional RVQ in noisy conditions.
Dynamic bitrate allocation improves compression efficiency.
Incorporating denoising enhances perceptual quality.
Abstract
Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high-fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades compression efficiency. Standard codecs allocate bits uniformly, wasting bitrate on noise components that do not contribute to intelligibility. This paper introduces a Variable Bitrate RVQ (VRVQ) framework for noise-robust speech coding, dynamically adjusting bitrate per frame to optimize rate-distortion trade-offs. Unlike constant bitrate (CBR) RVQ, our method prioritizes critical speech components while suppressing residual noise. Additionally, we integrate a feature denoiser to further improve noise robustness. Experimental results show that VRVQ improves rate-distortion trade-offs over conventional methods, achieving better compression efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis
