SNAC: Multi-Scale Neural Audio Codec

Hubert Siuzdak; Florian Gr\"otschla; Luca A. Lanzend\"orfer

arXiv:2410.14411·cs.SD·October 21, 2024

SNAC: Multi-Scale Neural Audio Codec

Hubert Siuzdak, Florian Gr\"otschla, Luca A. Lanzend\"orfer

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces the Multi-Scale Neural Audio Codec, an extension of Residual Vector Quantization that operates at multiple temporal resolutions to improve audio compression efficiency, supported by extensive evaluations.

Contribution

It proposes a hierarchical quantization approach that adapts to audio structure across multiple timescales, enhancing neural audio compression.

Findings

01

Achieves better compression efficiency than standard RVQ

02

Demonstrates high fidelity audio reconstruction at low bitrates

03

Open-sourced code and models for reproducibility

Abstract

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual Vector Quantization (RVQ) has become the standard technique for neural audio compression using a cascade of VQ codebooks. This paper proposes the Multi-Scale Neural Audio Codec, a simple extension of RVQ where the quantizers can operate at different temporal resolutions. By applying a hierarchy of quantizers at variable frame rates, the codec adapts to the audio structure across multiple timescales. This leads to more efficient compression, as demonstrated by extensive objective and subjective evaluations. The code and model weights are open-sourced at https://github.com/hubertsiuzdak/snac.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hubertsiuzdak/snac
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Neural Networks and Applications