SpatialNet: Extensively Learning Spatial Information for Multichannel   Joint Speech Separation, Denoising and Dereverberation

Changsheng Quan; Xiaofei Li

arXiv:2307.16516·cs.SD·December 25, 2023·2 cites

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

Changsheng Quan, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

SpatialNet is a neural network that extensively exploits spatial information in the STFT domain to improve multichannel speech separation, denoising, and dereverberation, achieving state-of-the-art results across various datasets.

Contribution

The paper introduces SpatialNet, a novel neural network architecture that combines narrow-band and cross-band processing to better utilize spatial cues for speech enhancement.

Findings

01

Achieves state-of-the-art performance on multiple tasks

02

Shows robustness against spectral generalization issues

03

Demonstrates effective speaker clustering through attention maps

Abstract

This work proposes a neural network to extensively exploit spatial information for multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In the short-time Fourier transform (STFT) domain, the proposed network performs end-to-end speech enhancement. It is mainly composed of interleaved narrow-band and cross-band blocks to respectively exploit narrow-band and cross-band spatial information. The narrow-band blocks process frequencies independently, and use self-attention mechanism and temporal convolutional layers to respectively perform spatial-feature-based speaker clustering and temporal smoothing/filtering. The cross-band blocks process frames independently, and use full-band linear layer and frequency convolutional layers to respectively learn the correlation between all frequencies and adjacent frequencies. Experiments are conducted on various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-westlakeu/nbss
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing