UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang,, Xianghu Yue, ShiLiang Zhang, Haizhou Li

TL;DR
UniCodec introduces a single, adaptive codebook for multi-domain audio compression, leveraging domain-specific strategies and self-supervised learning to outperform existing codecs across speech, music, and sound.
Contribution
It presents a novel unified audio codec with a domain-adaptive codebook and Mixture-of-Experts, enabling effective multi-domain audio processing within a single model.
Findings
Outperforms existing unified neural codecs in audio reconstruction.
Surpasses state-of-the-art domain-specific codecs in acoustic and semantic tasks.
Achieves excellent performance across speech, music, and sound domains.
Abstract
The emergence of audio language models is empowered by neural audio codecs, which establish critical mappings between continuous waveforms and discrete tokens compatible with language model paradigms. The evolutionary trends from multi-layer residual vector quantizer to single-layer quantizer are beneficial for language-autoregressive decoding. However, the capability to handle multi-domain audio signals through a single codebook remains constrained by inter-domain distribution discrepancies. In this work, we introduce UniCodec, a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound. To achieve this, we propose a partitioned domain-adaptive codebook method and domain Mixture-of-Experts strategy to capture the distinct characteristics of each audio domain. Furthermore, to enrich the semantic density of the codec without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
