Rate-Aware Learned Speech Compression
Jun Xu, Zhengxue Cheng, Guangchuan Chi, Yuhan Liu, Yuelin Hu, Li Song

TL;DR
This paper introduces a rate-aware neural speech compression method that replaces traditional quantizers with an entropy model and uses advanced neural blocks, achieving state-of-the-art rate-distortion performance and significant bitrate savings.
Contribution
It proposes a novel rate-aware compression scheme with an entropy model and enhanced neural blocks, improving RD performance and training simplicity over prior neural codecs.
Findings
Achieves 53.51% BD-Rate bitrate saving.
Gains of 0.26 BD-VisQol and 0.44 BD-PESQ.
Outperforms existing neural speech codecs.
Abstract
The rapid rise of real-time communication and large language models has significantly increased the importance of speech compression. Deep learning-based neural speech codecs have outperformed traditional signal-level speech codecs in terms of rate-distortion (RD) performance. Typically, these neural codecs employ an encoder-quantizer-decoder architecture, where audio is first converted into latent code feature representations and then into discrete tokens. However, this architecture exhibits insufficient RD performance due to two main drawbacks: (1) the inadequate performance of the quantizer, challenging training processes, and issues such as codebook collapse; (2) the limited representational capacity of the encoder and decoder, making it difficult to meet feature representation requirements across various bitrates. In this paper, we propose a rate-aware learned speech compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need · Convolution
