SpatialCodec: Neural Spatial Speech Coding

Zhongweiyang Xu; Yong Xu; Vinay Kothapally; Heming Wang; Muqiao Yang,; Dong Yu

arXiv:2309.07432·cs.SD·July 10, 2024

SpatialCodec: Neural Spatial Speech Coding

Zhongweiyang Xu, Yong Xu, Vinay Kothapally, Heming Wang, Muqiao Yang,, Dong Yu

PDF

Open Access 1 Repo

TL;DR

SpatialCodec introduces a neural spatial audio coding framework that efficiently compresses multi-channel speech recordings while preserving spatial cues, enabling accurate reconstruction and superior spatial performance compared to existing methods.

Contribution

The paper presents a novel neural spatial audio coding framework combining a low-bit-rate sub-band codec with a spatial information encoder, along with new metrics for spatial cue evaluation.

Findings

01

Achieves high compression ratio with accurate spatial cue preservation.

02

Outperforms high bitrate baselines and black-box neural architectures in spatial tasks.

03

Provides publicly available demos, code, and models.

Abstract

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec. Our approach encompasses two phases: (i) a neural sub-band codec is designed to encode the reference channel with low bit rates, and (ii), a SpatialCodec captures relative spatial information for accurate multi-channel reconstruction at the decoder end. In addition, we also propose novel evaluation metrics to assess the spatial cue preservation: (i) spatial similarity, which calculates cosine similarity on a spatially intuitive beamspace, and (ii), beamformed audio quality. Our system shows superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xzwy/spatialcodec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research