Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization
Andreas Brendel, Nicola Pia, Kishan Gupta, Lyonel Behringer, Guillaume, Fuchs, Markus Multrus

TL;DR
This paper introduces a simple scalar quantization approach for neural speech codecs, improving real-time communication at low bitrates with reduced complexity and training overhead.
Contribution
It proposes a scalar quantization method as an alternative to vector quantization, simplifying training and enhancing low-bitrate, real-time neural speech coding.
Findings
Performs well at very low bitrates
Reduces training complexity by avoiding codebook storage
Operates efficiently in real-time with low latency
Abstract
Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learned discrete representation of neural codecs is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose and analyze simple alternatives to VQ, which are based on projected Scalar Quantization (SQ).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Neural Networks and Applications
