ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with   Complex Spectrum Modeling

Yi-Chiao Wu; Dejan Markovi\'c; Steven Krenn; Israel D. Gebru,; Alexander Richard

arXiv:2502.02019·eess.AS·February 5, 2025

ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling

Yi-Chiao Wu, Dejan Markovi\'c, Steven Krenn, Israel D. Gebru,, Alexander Richard

PDF

Open Access

TL;DR

ComplexDec is a neural audio codec that uses complex spectrum modeling to achieve high-fidelity, domain-robust audio compression at 48kHz, outperforming existing codecs in out-of-domain scenarios.

Contribution

It introduces a full-band complex spectral neural codec that enhances out-of-domain robustness without increasing bitrate, using only a small training dataset.

Findings

01

Demonstrates superior out-of-domain robustness in objective evaluations.

02

Achieves high-fidelity audio at 48kHz with 24 kbps bitrate.

03

Outperforms baseline codecs in subjective listening tests.

Abstract

Neural audio codecs have been widely adopted in audio-generative tasks because their compact and discrete representations are suitable for both large-language-model-style and regression-based generative models. However, most neural codecs struggle to model out-of-domain audio, resulting in error propagations to downstream generative tasks. In this paper, we first argue that information loss from codec compression degrades out-of-domain robustness. Then, we propose full-band 48~kHz ComplexDec with complex spectral input and output to ease the information loss while adopting the same 24~kbps bitrate as the baseline AuidoDec and ScoreDec. Objective and subjective evaluations demonstrate the out-of-domain robustness of ComplexDec trained using only the 30-hour VCTK corpus.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Speech and Audio Processing · Music and Audio Processing