SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Jin Wang; Wenbin Jiang; Xiangbo Wang; Yubo You; Sheng Fang

arXiv:2505.24437·cs.SD·May 8, 2026

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Jin Wang, Wenbin Jiang, Xiangbo Wang, Yubo You, Sheng Fang

PDF

TL;DR

SwitchCodec introduces a high-fidelity neural audio codec with innovative quantization and discriminator strategies, achieving superior quality at low bitrates and supporting multiple bitrates efficiently.

Contribution

The paper presents REVQ, a novel quantization method, and a multi-tiered discriminator, enabling high-quality, multi-bitrate neural audio compression with reduced training time.

Findings

01

Achieves PESQ of 2.87 at 2.67 kbps

02

Reduces spectral blur by 13%

03

Post-training strategy matches fixed-bitrate performance

Abstract

Neural audio compression has emerged as a promising technology for efficiently representing speech, music, and general audio. However, existing methods suffer from significant performance degradation at limited bitrates, where the available embedding space is sharply constrained. To address this, we propose a universal high-fidelity neural audio compression algorithm featuring Residual Experts Vector Quantization (REVQ), which substantially expands the embedding space with minimal impact on bandwidth. A gentle load-balancing strategy is introduced to ensure the full utilization of this expanded space. Furthermore, we develop a novel multi-tiered discriminator that periodically stratifies STFT spectra, guiding the generator to focus on critical spectral regions. To support multiple bitrates without quality loss at the lower end, we adopt an efficient post-training strategy. Our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.