SpatialCodec: Neural Spatial Speech Coding
Zhongweiyang Xu, Yong Xu, Vinay Kothapally, Heming Wang, Muqiao Yang,, Dong Yu

TL;DR
SpatialCodec introduces a neural spatial audio coding framework that efficiently compresses multi-channel speech recordings while preserving spatial cues, enabling accurate reconstruction and superior spatial performance compared to existing methods.
Contribution
The paper presents a novel neural spatial audio coding framework combining a low-bit-rate sub-band codec with a spatial information encoder, along with new metrics for spatial cue evaluation.
Findings
Achieves high compression ratio with accurate spatial cue preservation.
Outperforms high bitrate baselines and black-box neural architectures in spatial tasks.
Provides publicly available demos, code, and models.
Abstract
In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec. Our approach encompasses two phases: (i) a neural sub-band codec is designed to encode the reference channel with low bit rates, and (ii), a SpatialCodec captures relative spatial information for accurate multi-channel reconstruction at the decoder end. In addition, we also propose novel evaluation metrics to assess the spatial cue preservation: (i) spatial similarity, which calculates cosine similarity on a spatially intuitive beamspace, and (ii), beamformed audio quality. Our system shows superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
