High-Fidelity Audio Compression with Improved RVQGAN

Rithesh Kumar; Prem Seetharaman; Alejandro Luebs; Ishaan Kumar; Kundan; Kumar

arXiv:2306.06546·cs.SD·October 30, 2023·24 cites

High-Fidelity Audio Compression with Improved RVQGAN

Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan, Kumar

PDF

Open Access 4 Repos 10 Models 2 Datasets 1 Video

TL;DR

This paper presents a universal neural audio compression method that achieves high fidelity and significant compression ratios across various audio domains, utilizing advanced vector quantization and adversarial training techniques.

Contribution

Introduces a universal neural audio compression algorithm combining high-fidelity generation with improved vector quantization, outperforming existing methods across all audio types.

Findings

01

Achieves ~90x compression at 8kbps for 44.1 KHz audio.

02

Outperforms existing audio compression algorithms.

03

Provides comprehensive ablation studies and open-source resources.

Abstract

Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

High-Fidelity Audio Compression with Improved RVQGAN· slideslive

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies