Rate-Aware Learned Speech Compression

Jun Xu; Zhengxue Cheng; Guangchuan Chi; Yuhan Liu; Yuelin Hu; Li Song

arXiv:2501.11999·eess.AS·January 22, 2025

Rate-Aware Learned Speech Compression

Jun Xu, Zhengxue Cheng, Guangchuan Chi, Yuhan Liu, Yuelin Hu, Li Song

PDF

Open Access

TL;DR

This paper introduces a rate-aware neural speech compression method that replaces traditional quantizers with an entropy model and uses advanced neural blocks, achieving state-of-the-art rate-distortion performance and significant bitrate savings.

Contribution

It proposes a novel rate-aware compression scheme with an entropy model and enhanced neural blocks, improving RD performance and training simplicity over prior neural codecs.

Findings

01

Achieves 53.51% BD-Rate bitrate saving.

02

Gains of 0.26 BD-VisQol and 0.44 BD-PESQ.

03

Outperforms existing neural speech codecs.

Abstract

The rapid rise of real-time communication and large language models has significantly increased the importance of speech compression. Deep learning-based neural speech codecs have outperformed traditional signal-level speech codecs in terms of rate-distortion (RD) performance. Typically, these neural codecs employ an encoder-quantizer-decoder architecture, where audio is first converted into latent code feature representations and then into discrete tokens. However, this architecture exhibits insufficient RD performance due to two main drawbacks: (1) the inadequate performance of the quantizer, challenging training processes, and issues such as codebook collapse; (2) the limited representational capacity of the encoder and decoder, making it difficult to meet feature representation requirements across various bitrates. In this paper, we propose a rate-aware learned speech compression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need · Convolution