Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders

Dimitrios Bralios; Jonah Casebeer; Paris Smaragdis

arXiv:2507.07867·cs.SD·September 10, 2025

Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders

Dimitrios Bralios, Jonah Casebeer, Paris Smaragdis

PDF

Open Access

TL;DR

This paper introduces Re-Bottleneck, a post-hoc method to modify pre-trained neural audio autoencoders' latent space, enabling structured, application-specific representations without retraining the entire model.

Contribution

The paper presents a simple framework to impose user-defined structure on latent spaces of pre-trained autoencoders through a dedicated inner bottleneck trained with latent losses.

Findings

01

Enforces latent channel ordering without losing reconstruction quality.

02

Aligns latents with semantic embeddings to improve downstream tasks.

03

Implements equivariance, linking input filtering to latent transformations.

Abstract

Neural audio codecs and autoencoders have emerged as versatile models for audio compression, transmission, feature-extraction, and latent-space generation. However, a key limitation is that most are trained to maximize reconstruction fidelity, often neglecting the specific latent structure necessary for optimal performance in diverse downstream applications. We propose a simple, post-hoc framework to address this by modifying the bottleneck of a pre-trained autoencoder. Our method introduces a "Re-Bottleneck", an inner bottleneck trained exclusively through latent space losses to instill user-defined structure. We demonstrate the framework's effectiveness in three experiments. First, we enforce an ordering on latent channels without sacrificing reconstruction quality. Second, we align latents with semantic embeddings, analyzing the impact on downstream diffusion modeling. Third, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing