Practical cognitive speech compression
Reza Lotfidereshgi, Philippe Gournay

TL;DR
This paper introduces a practical neural speech compression method that achieves low bitrate, low latency, and high subjective quality comparable to standard codecs, suitable for mobile devices.
Contribution
It combines a hierarchical unsupervised encoder with a GAN-based decoder, improving speech quality at low bitrate and demonstrating robustness to quantization.
Findings
Outperforms AMR-WB codec in delay, bitrate, and subjective quality
Robust to quantization of representation features
Suitable for mobile device implementation
Abstract
This paper presents a new neural speech compression method that is practical in the sense that it operates at low bitrate, introduces a low latency, is compatible in computational complexity with current mobile devices, and provides a subjective quality that is comparable to that of standard mobile-telephony codecs. Other recently proposed neural vocoders also have the ability to operate at low bitrate. However, they do not produce the same level of subjective quality as standard codecs. On the other hand, standard codecs rely on objective and short-term metrics such as the segmental signal-to-noise ratio that correlate only weakly with perception. Furthermore, standard codecs are less efficient than unsupervised neural networks at capturing speech attributes, especially long-term ones. The proposed method combines a cognitive-coding encoder that extracts an interpretable unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis
