BANC: Towards Efficient Binaural Audio Neural Codec for Overlapping Speech
Anton Ratnarajah, Shi-Xiong Zhang, Dong Yu

TL;DR
BANC is a neural binaural audio codec that efficiently compresses overlapping speech while preserving spatial cues, reducing bandwidth by nearly half and maintaining spatial accuracy in various acoustic scenarios.
Contribution
It introduces a novel architecture that separately compresses speech content and spatial cues, enabling effective overlapping speech compression with preserved spatial information.
Findings
Reduces binaural speech bandwidth by 48%.
Accurately preserves spatial cues post-decoding.
Effective in single and two-speaker scenarios.
Abstract
We introduce BANC, a neural binaural audio codec designed for efficient speech compression in single and two-speaker scenarios while preserving the spatial location information of each speaker. Our key contributions are as follows: 1) The ability of our proposed model to compress and decode overlapping speech. 2) A novel architecture that compresses speech content and spatial cues separately, ensuring the preservation of each speaker's spatial context after decoding. 3) BANC's proficiency in reducing the bandwidth required for compressing binaural speech by 48% compared to compressing individual binaural channels. In our evaluation, we employed speech enhancement, room acoustics, and perceptual metrics to assess the accuracy of BANC's clean speech and spatial cue estimates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
