Neural Speech Coding for Real-time Communications using Constant Bitrate   Scalar Quantization

Andreas Brendel; Nicola Pia; Kishan Gupta; Lyonel Behringer; Guillaume; Fuchs; Markus Multrus

arXiv:2405.08417·eess.AS·September 20, 2024

Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

Andreas Brendel, Nicola Pia, Kishan Gupta, Lyonel Behringer, Guillaume, Fuchs, Markus Multrus

PDF

Open Access

TL;DR

This paper introduces a simple scalar quantization approach for neural speech codecs, improving real-time communication at low bitrates with reduced complexity and training overhead.

Contribution

It proposes a scalar quantization method as an alternative to vector quantization, simplifying training and enhancing low-bitrate, real-time neural speech coding.

Findings

01

Performs well at very low bitrates

02

Reduces training complexity by avoiding codebook storage

03

Operates efficiently in real-time with low latency

Abstract

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learned discrete representation of neural codecs is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose and analyze simple alternatives to VQ, which are based on projected Scalar Quantization (SQ).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Neural Networks and Applications