Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
Dimitrios Bralios, Jonah Casebeer, Paris Smaragdis

TL;DR
This paper introduces Re-Bottleneck, a post-hoc method to modify pre-trained neural audio autoencoders' latent space, enabling structured, application-specific representations without retraining the entire model.
Contribution
The paper presents a simple framework to impose user-defined structure on latent spaces of pre-trained autoencoders through a dedicated inner bottleneck trained with latent losses.
Findings
Enforces latent channel ordering without losing reconstruction quality.
Aligns latents with semantic embeddings to improve downstream tasks.
Implements equivariance, linking input filtering to latent transformations.
Abstract
Neural audio codecs and autoencoders have emerged as versatile models for audio compression, transmission, feature-extraction, and latent-space generation. However, a key limitation is that most are trained to maximize reconstruction fidelity, often neglecting the specific latent structure necessary for optimal performance in diverse downstream applications. We propose a simple, post-hoc framework to address this by modifying the bottleneck of a pre-trained autoencoder. Our method introduces a "Re-Bottleneck", an inner bottleneck trained exclusively through latent space losses to instill user-defined structure. We demonstrate the framework's effectiveness in three experiments. First, we enforce an ordering on latent channels without sacrificing reconstruction quality. Second, we align latents with semantic embeddings, analyzing the impact on downstream diffusion modeling. Third, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing
