BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Haoran Wang, Jiatong Shi, Jinchuan Tian, Bohan Li, Kai Yu, Shinji Watanabe

TL;DR
BSCodec is a neural audio codec that splits audio spectrum into bands for independent compression, improving high-quality reconstruction across speech, music, and sound while maintaining competitive speech quality.
Contribution
Introduces BSCodec, a novel neural audio codec architecture that processes spectral bands separately, addressing the limitations of speech-optimized codecs on diverse audio types.
Findings
Achieves superior reconstruction quality across diverse audio content.
Maintains competitive speech quality while improving music and sound fidelity.
Shows strong potential for downstream audio applications.
Abstract
Neural audio codecs have recently enabled high-fidelity reconstruction at high compression rates, especially for speech. However, speech and non-speech audio exhibit fundamentally different spectral characteristics: speech energy concentrates in narrow bands around pitch harmonics (80-400 Hz), while non-speech audio requires faithful reproduction across the full spectrum, particularly preserving higher frequencies that define timbre and texture. This poses a challenge: speech-optimized neural codecs suffer degradation on music or sound. Treating the full spectrum holistically is suboptimal: frequency bands have vastly different information density and perceptual importance by content type, yet full-band approaches apply uniform capacity across frequencies without accounting for these acoustic structures. To address this gap, we propose BSCodec (Band-Split Codec), a novel neural audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
