Feedback Recurrent AutoEncoder
Yang Yang, Guillaume Sauti\`ere, J. Jon Ryu, Taco S Cohen

TL;DR
This paper introduces the Feedback Recurrent AutoEncoder (FRAE), a novel architecture for online sequential data compression that effectively captures temporal redundancy, enabling high-quality speech waveform reconstruction at low bitrates.
Contribution
The paper presents a new recurrent autoencoder architecture, FRAE, designed for efficient online compression of sequential data with improved bitrate control and reconstruction quality.
Findings
FRAE effectively compresses speech spectrograms with high quality.
Combining FRAE with a neural vocoder yields high-quality speech at low fixed bitrates.
Adding a learned prior and entropy coding further reduces bitrate variability.
Abstract
In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Data Compression Techniques · Speech and Audio Processing
MethodsSolana Customer Service Number +1-833-534-1729
