Music2Latent: Consistency Autoencoders for Latent Audio Compression

Marco Pasini; Stefan Lattner; George Fazekas

arXiv:2408.06500·cs.SD·August 14, 2024·2 cites

Music2Latent: Consistency Autoencoders for Latent Audio Compression

Marco Pasini, Stefan Lattner, George Fazekas

PDF

Open Access 1 Repo 1 Models

TL;DR

Music2Latent introduces a novel end-to-end consistency autoencoder for audio that achieves high-fidelity, single-step reconstruction and outperforms existing models in sound quality and accuracy, advancing generative audio and MIR applications.

Contribution

It presents the first successful end-to-end consistency autoencoder for audio, integrating frequency-wise self-attention and learned scaling for improved performance.

Findings

01

Outperforms existing autoencoders in sound quality and reconstruction accuracy

02

Enables high-fidelity single-step audio reconstruction

03

Achieves competitive results on MIR tasks

Abstract

Efficient audio representations in a compressed continuous latent space are critical for generative audio modeling and Music Information Retrieval (MIR) tasks. However, some existing audio autoencoders have limitations, such as multi-stage training procedures, slow iterative sampling, or low reconstruction quality. We introduce Music2Latent, an audio autoencoder that overcomes these limitations by leveraging consistency models. Music2Latent encodes samples into a compressed continuous latent space in a single end-to-end training process while enabling high-fidelity single-step reconstruction. Key innovations include conditioning the consistency model on upsampled encoder outputs at all levels through cross connections, using frequency-wise self-attention to capture long-range frequency dependencies, and employing frequency-wise learned scaling to handle varying value distributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SonyCSLParis/music2latent
pytorchOfficial

Models

🤗
SonyCSLParis/music2latent
model· ♡ 14
♡ 14

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies