SAME: A Semantically-Aligned Music Autoencoder

Julian D. Parker; Zach Evans; CJ Carr; Zachary Zukowski; Josiah Taylor; Matthew Rice; Jordi Pons

arXiv:2605.18613·cs.SD·May 19, 2026

SAME: A Semantically-Aligned Music Autoencoder

Julian D. Parker, Zach Evans, CJ Carr, Zachary Zukowski, Josiah Taylor, Matthew Rice, Jordi Pons

PDF

2 Models

TL;DR

SAME is a novel autoencoder for stereo music and audio that achieves high compression ratios while maintaining quality, leveraging transformers and semantic regularization for efficient, high-fidelity reconstruction and generation.

Contribution

Introduces SAME, a transformer-based autoencoder with semantic regularization for highly compressed, high-quality stereo music and audio reconstruction and generation.

Findings

01

Achieves 4096× compression ratio with maintained quality.

02

Delivers computational efficiency through transformer architecture.

03

Provides open-weights for two model variants, SAME-L and SAME-S.

Abstract

Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically-Aligned Music autoEncoder), an autoencoder for stereo music and general audio that reaches a 4096 $\times$ temporal compression ratio while maintaining reconstruction quality and downstream generative performance. We achieve this by combining a tranformer-based backbone with set of semantic regularisation approaches, phase-aware reconstruction losses and improved discriminator designs. The architecture delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives. Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.