Feedback Recurrent AutoEncoder

Yang Yang; Guillaume Sauti\`ere; J. Jon Ryu; Taco S Cohen

arXiv:1911.04018·cs.LG·February 18, 2020·1 cites

Feedback Recurrent AutoEncoder

Yang Yang, Guillaume Sauti\`ere, J. Jon Ryu, Taco S Cohen

PDF

Open Access

TL;DR

This paper introduces the Feedback Recurrent AutoEncoder (FRAE), a novel architecture for online sequential data compression that effectively captures temporal redundancy, enabling high-quality speech waveform reconstruction at low bitrates.

Contribution

The paper presents a new recurrent autoencoder architecture, FRAE, designed for efficient online compression of sequential data with improved bitrate control and reconstruction quality.

Findings

01

FRAE effectively compresses speech spectrograms with high quality.

02

Combining FRAE with a neural vocoder yields high-quality speech at low fixed bitrates.

03

Adding a learned prior and entropy coding further reduces bitrate variability.

Abstract

In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Advanced Data Compression Techniques · Speech and Audio Processing

MethodsSolana Customer Service Number +1-833-534-1729