Practical cognitive speech compression

Reza Lotfidereshgi; Philippe Gournay

arXiv:2203.04415·eess.AS·March 10, 2022

Practical cognitive speech compression

Reza Lotfidereshgi, Philippe Gournay

PDF

Open Access

TL;DR

This paper introduces a practical neural speech compression method that achieves low bitrate, low latency, and high subjective quality comparable to standard codecs, suitable for mobile devices.

Contribution

It combines a hierarchical unsupervised encoder with a GAN-based decoder, improving speech quality at low bitrate and demonstrating robustness to quantization.

Findings

01

Outperforms AMR-WB codec in delay, bitrate, and subjective quality

02

Robust to quantization of representation features

03

Suitable for mobile device implementation

Abstract

This paper presents a new neural speech compression method that is practical in the sense that it operates at low bitrate, introduces a low latency, is compatible in computational complexity with current mobile devices, and provides a subjective quality that is comparable to that of standard mobile-telephony codecs. Other recently proposed neural vocoders also have the ability to operate at low bitrate. However, they do not produce the same level of subjective quality as standard codecs. On the other hand, standard codecs rely on objective and short-term metrics such as the segmental signal-to-noise ratio that correlate only weakly with perception. Furthermore, standard codecs are less efficient than unsupervised neural networks at capturing speech attributes, especially long-term ones. The proposed method combines a cognitive-coding encoder that extracts an interpretable unsupervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis