Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

Bernardo Torres; Manuel Moussallam; Gabriel Meseguer-Brocal

arXiv:2510.23530·cs.SD·January 29, 2026

Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal

PDF

TL;DR

This paper proposes a simple data augmentation-based training method to induce linearity in audio autoencoders, enabling more intuitive manipulation of audio representations without changing the model architecture.

Contribution

It introduces a novel training approach that promotes linearity in high-compression autoencoders, facilitating algebraic operations in the latent space for audio processing.

Findings

01

CAE exhibits linear behavior in encoder and decoder

02

Latent space arithmetic improves music source separation

03

Method preserves reconstruction fidelity

Abstract

Audio autoencoders learn useful, compressed audio representations, but their non-linear latent spaces prevent intuitive algebraic manipulation such as mixing or scaling. We introduce a simple training methodology to induce linearity in a high-compression Consistency Autoencoder (CAE) by using data augmentation, thereby inducing homogeneity (equivariance to scalar gain) and additivity (the decoder preserves addition) without altering the model's architecture or loss function. When trained with our method, the CAE exhibits linear behavior in both the encoder and decoder while preserving reconstruction fidelity. We test the practical utility of our learned space on music source composition and separation via simple latent arithmetic. This work presents a straightforward technique for constructing structured latent spaces, enabling more intuitive and efficient audio processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.