UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction
Zhisheng Zhang, Xiang Li, Yixuan Zhou, Jing Peng, Shengbo Cai, Guoyang Zeng, Zhiyong Wu

TL;DR
UniSRCodec is a novel single-codebook neural audio codec that achieves high-fidelity, low-bitrate audio compression across a wide frequency range by combining time-frequency compression, sub-band reconstruction, and phase recovery, outperforming existing methods.
Contribution
It introduces UniSRCodec, a unified single-codebook audio codec supporting high sampling rates and fidelity, with innovative sub-band reconstruction and phase recovery techniques.
Findings
Achieves state-of-the-art performance among cross-domain single-codebook codecs.
Maintains high reconstruction quality at only 40 tokens bitrate.
Comparable to multi-codebook methods in quality.
Abstract
Neural Audio Codecs (NACs) can reduce transmission overhead by performing compact compression and reconstruction, which also aim to bridge the gap between continuous and discrete signals. Existing NACs can be divided into two categories: multi-codebook and single-codebook codecs. Multi-codebook codecs face challenges such as structural complexity and difficulty in adapting to downstream tasks, while single-codebook codecs, though structurally simpler, suffer from low-fidelity, ineffective modeling of unified audio, and an inability to support modeling of high-frequency audio. We propose the UniSRCodec, a single-codebook codec capable of supporting high sampling rate, low-bandwidth, high fidelity, and unified. We analyze the inefficiency of waveform-based compression and introduce the time and frequency compression method using the Mel-spectrogram, and cooperate with a Vocoder to recover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Speech Recognition and Synthesis
