EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding
Luca Cerovaz, Michele Mancusi, Emanuele Rodol\`a

TL;DR
EuleroDec is a novel complex-valued RVQ-VAE audio codec that efficiently preserves phase information, eliminates adversarial training, and achieves state-of-the-art performance with reduced training time.
Contribution
It introduces a complex-valued neural codec that maintains magnitude-phase coupling without adversarial training or diffusion, improving efficiency and quality.
Findings
Matches or surpasses longer-trained baselines in-domain.
Achieves state-of-the-art out-of-domain performance.
Reduces training time by an order of magnitude.
Abstract
Audio codecs power discrete music generative modelling, music streaming and immersive media by shrinking PCM audio to bandwidth-friendly bit-rates. Recent works have gravitated towards processing in the spectral domain; however, spectrogram-domains typically struggle with phase modeling which is naturally complex-valued. Most frequency-domain neural codecs either disregard phase information or encode it as two separate real-valued channels, limiting spatial fidelity. This entails the need to introduce adversarial discriminators at the expense of convergence speed and training stability to compensate for the inadequate representation power of the audio signal. In this work we introduce an end-to-end complex-valued RVQ-VAE audio codec that preserves magnitude-phase coupling across the entire analysis-quantization-synthesis pipeline and removes adversarial discriminators and diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Music and Audio Processing
