Cross-Scale Vector Quantization for Scalable Neural Speech Coding
Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

TL;DR
This paper presents a novel cross-scale scalable vector quantization method for neural speech coding that enables bitrate scalability and outperforms existing codecs at various bitrates.
Contribution
The authors introduce CSVQ, a scalable vector quantization scheme that allows neural speech codecs to adapt to different bitrates without retraining.
Findings
CSVQ outperforms classical residual VQ in scalability.
CSVQ at 3 kbps surpasses Opus at 9 kbps and Lyra at 3 kbps.
The scheme provides graceful quality improvement with increased bitrate.
Abstract
Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which increases the memory footprint at the sender and the receiver side and transcoding is often needed to support multiple receivers. In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement. In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves the quality as more bits are available. The proposed CSVQ scheme can be flexibly applied to any neural audio coding network with a mirrored auto-encoder structure to achieve bitrate scalability. Subjective results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Image and Signal Denoising Methods
