SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
Jin Wang, Wenbin Jiang, Xiangbo Wang, Yubo You, Sheng Fang

TL;DR
SwitchCodec introduces a high-fidelity neural audio codec with innovative quantization and discriminator strategies, achieving superior quality at low bitrates and supporting multiple bitrates efficiently.
Contribution
The paper presents REVQ, a novel quantization method, and a multi-tiered discriminator, enabling high-quality, multi-bitrate neural audio compression with reduced training time.
Findings
Achieves PESQ of 2.87 at 2.67 kbps
Reduces spectral blur by 13%
Post-training strategy matches fixed-bitrate performance
Abstract
Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available embedding space is sharply constrained. To address this, we propose a universal high-fidelity neural audio compression algorithm featuring Residual Experts Vector Quantization (REVQ), which substantially expands the embedding space with minimal impact on bandwidth. A gentle load-balancing strategy is introduced to ensure the full utilization of this expanded space. Furthermore, we develop a novel multi-tiered discriminator that periodically stratifies STFT spectra, guiding the generator to focus on critical spectral regions. To support multiple bitrates without quality loss at the lower end, we adopt an efficient post-training strategy. Our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
