ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs
Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Yang Ai, Zhen-Hua Ling

TL;DR
This paper introduces ERVQ, a novel enhancement for neural audio codecs that mitigates codebook collapse through intra- and inter-codebook optimization, significantly improving audio quality and generalization.
Contribution
ERVQ is the first method to effectively address codebook collapse in neural audio codecs using combined intra- and inter-codebook strategies.
Findings
Achieves 100% codebook utilization in advanced neural codecs.
Significantly improves audio quality across models, rates, and sampling frequencies.
Enhances downstream speech synthesis and TTS performance.
Abstract
Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To address this problem, we introduce ERVQ, Enhanced Residual Vector Quantization, a novel enhancement strategy for the RVQ framework in neural audio codecs. ERVQ mitigates codebook collapse and boosts codec performance through both intra- and inter-codebook optimization. Intra-codebook optimization incorporates an online clustering strategy and a code balancing loss to ensure balanced and efficient codebook utilization. Inter-codebook optimization improves the diversity of quantized features by minimizing the similarity between successive quantizations. Our experiments show that ERVQ significantly enhances audio codec performance across different models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
