High Fidelity Neural Audio Compression

Alexandre D\'efossez; Jade Copet; Gabriel Synnaeve; Yossi Adi

arXiv:2210.13438·eess.AS·October 25, 2022·280 cites

High Fidelity Neural Audio Compression

Alexandre D\'efossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

PDF

Open Access 5 Repos 10 Models 1 Video

TL;DR

This paper presents a neural audio codec that achieves high-fidelity, real-time compression across various audio types, utilizing a novel training stabilization method and lightweight Transformers for further compression.

Contribution

It introduces a new neural audio codec with a stable training mechanism and demonstrates effective compression with lightweight Transformers, outperforming existing methods.

Findings

01

Superior audio quality across multiple domains and bandwidths

02

40% additional compression with lightweight Transformers

03

Stable training achieved through a novel loss balancer mechanism

Abstract

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

High Fidelity Neural Audio Compression | Paper & Code Explained· youtube

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Image and Signal Denoising Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Absolute Position Encodings · Layer Normalization