EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding

Luca Cerovaz; Michele Mancusi; Emanuele Rodol\`a

arXiv:2601.17517·cs.SD·January 29, 2026

EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding

Luca Cerovaz, Michele Mancusi, Emanuele Rodol\`a

PDF

Open Access

TL;DR

EuleroDec is a novel complex-valued RVQ-VAE audio codec that efficiently preserves phase information, eliminates adversarial training, and achieves state-of-the-art performance with reduced training time.

Contribution

It introduces a complex-valued neural codec that maintains magnitude-phase coupling without adversarial training or diffusion, improving efficiency and quality.

Findings

01

Matches or surpasses longer-trained baselines in-domain.

02

Achieves state-of-the-art out-of-domain performance.

03

Reduces training time by an order of magnitude.

Abstract

Audio codecs power discrete music generative modelling, music streaming and immersive media by shrinking PCM audio to bandwidth-friendly bit-rates. Recent works have gravitated towards processing in the spectral domain; however, spectrogram-domains typically struggle with phase modeling which is naturally complex-valued. Most frequency-domain neural codecs either disregard phase information or encode it as two separate real-valued channels, limiting spatial fidelity. This entails the need to introduce adversarial discriminators at the expense of convergence speed and training stability to compensate for the inadequate representation power of the audio signal. In this work we introduce an end-to-end complex-valued RVQ-VAE audio codec that preserves magnitude-phase coupling across the entire analysis-quantization-synthesis pipeline and removes adversarial discriminators and diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Music and Audio Processing