Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal

TL;DR
This paper proposes a simple data augmentation-based training method to induce linearity in audio autoencoders, enabling more intuitive manipulation of audio representations without changing the model architecture.
Contribution
It introduces a novel training approach that promotes linearity in high-compression autoencoders, facilitating algebraic operations in the latent space for audio processing.
Findings
CAE exhibits linear behavior in encoder and decoder
Latent space arithmetic improves music source separation
Method preserves reconstruction fidelity
Abstract
Audio autoencoders learn useful, compressed audio representations, but their non-linear latent spaces prevent intuitive algebraic manipulation such as mixing or scaling. We introduce a simple training methodology to induce linearity in a high-compression Consistency Autoencoder (CAE) by using data augmentation, thereby inducing homogeneity (equivariance to scalar gain) and additivity (the decoder preserves addition) without altering the model's architecture or loss function. When trained with our method, the CAE exhibits linear behavior in both the encoder and decoder while preserving reconstruction fidelity. We test the practical utility of our learned space on music source composition and separation via simple latent arithmetic. This work presents a straightforward technique for constructing structured latent spaces, enabling more intuitive and efficient audio processing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
