Music2Latent2: Audio Compression with Summary Embeddings and   Autoregressive Decoding

Marco Pasini; Stefan Lattner; George Fazekas

arXiv:2501.17578·cs.SD·January 30, 2025

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding

Marco Pasini, Stefan Lattner, George Fazekas

PDF

Open Access

TL;DR

Music2Latent2 introduces a novel audio autoencoder that uses unordered summary embeddings and autoregressive consistency models to achieve high-quality audio compression and reconstruction, improving over existing methods.

Contribution

It presents a new autoencoder architecture utilizing summary embeddings and autoregressive models for better audio compression and fidelity.

Findings

01

Outperforms existing autoencoders in audio quality

02

Achieves higher compression ratios with maintained fidelity

03

Enhances downstream task performance

Abstract

Efficiently compressing high-dimensional audio signals into a compact and informative latent space is crucial for various tasks, including generative modeling and music information retrieval (MIR). Existing audio autoencoders, however, often struggle to achieve high compression ratios while preserving audio fidelity and facilitating efficient downstream applications. We introduce Music2Latent2, a novel audio autoencoder that addresses these limitations by leveraging consistency models and a novel approach to representation learning based on unordered latent embeddings, which we call summary embeddings. Unlike conventional methods that encode local audio features into ordered sequences, Music2Latent2 compresses audio signals into sets of summary embeddings, where each embedding can capture distinct global features of the input sample. This enables to achieve higher reconstruction quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsConsistency Models