Learning to Upsample and Upmix Audio in the Latent Domain

Dimitrios Bralios; Paris Smaragdis; Jonah Casebeer

arXiv:2506.00681·cs.SD·September 10, 2025

Learning to Upsample and Upmix Audio in the Latent Domain

Dimitrios Bralios, Paris Smaragdis, Jonah Casebeer

PDF

Open Access

TL;DR

This paper introduces a novel framework for performing audio upsampling and upmixing directly within the latent space of neural autoencoders, significantly improving efficiency while maintaining quality.

Contribution

It proposes a latent domain processing approach that simplifies training and reduces computational costs for audio enhancement tasks.

Findings

01

Achieves up to 100x computational efficiency gains.

02

Maintains audio quality comparable to raw audio processing.

03

Validates the approach on bandwidth extension and mono-to-stereo up-mixing.

Abstract

Neural audio autoencoders create compact latent representations that preserve perceptually important information, serving as the foundation for both modern audio compression systems and generation approaches like next-token prediction and latent diffusion. Despite their prevalence, most audio processing operations, such as spatial and spectral up-sampling, still inefficiently operate on raw waveforms or spectral representations rather than directly on these compressed representations. We propose a framework that performs audio processing operations entirely within an autoencoder's latent space, eliminating the need to decode to raw audio formats. Our approach dramatically simplifies training by operating solely in the latent domain, with a latent L1 reconstruction term, augmented by a single latent adversarial discriminator. This contrasts sharply with raw-audio methods that typically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing