Music2Latent: Consistency Autoencoders for Latent Audio Compression
Marco Pasini, Stefan Lattner, George Fazekas

TL;DR
Music2Latent introduces a novel end-to-end consistency autoencoder for audio that achieves high-fidelity, single-step reconstruction and outperforms existing models in sound quality and accuracy, advancing generative audio and MIR applications.
Contribution
It presents the first successful end-to-end consistency autoencoder for audio, integrating frequency-wise self-attention and learned scaling for improved performance.
Findings
Outperforms existing autoencoders in sound quality and reconstruction accuracy
Enables high-fidelity single-step audio reconstruction
Achieves competitive results on MIR tasks
Abstract
Efficient audio representations in a compressed continuous latent space are critical for generative audio modeling and Music Information Retrieval (MIR) tasks. However, some existing audio autoencoders have limitations, such as multi-stage training procedures, slow iterative sampling, or low reconstruction quality. We introduce Music2Latent, an audio autoencoder that overcomes these limitations by leveraging consistency models. Music2Latent encodes samples into a compressed continuous latent space in a single end-to-end training process while enabling high-fidelity single-step reconstruction. Key innovations include conditioning the consistency model on upsampled encoder outputs at all levels through cross connections, using frequency-wise self-attention to capture long-range frequency dependencies, and employing frequency-wise learned scaling to handle varying value distributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies
