SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation
Changsheng Quan, Xiaofei Li

TL;DR
SpatialNet is a neural network that extensively exploits spatial information in the STFT domain to improve multichannel speech separation, denoising, and dereverberation, achieving state-of-the-art results across various datasets.
Contribution
The paper introduces SpatialNet, a novel neural network architecture that combines narrow-band and cross-band processing to better utilize spatial cues for speech enhancement.
Findings
Achieves state-of-the-art performance on multiple tasks
Shows robustness against spectral generalization issues
Demonstrates effective speaker clustering through attention maps
Abstract
This work proposes a neural network to extensively exploit spatial information for multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In the short-time Fourier transform (STFT) domain, the proposed network performs end-to-end speech enhancement. It is mainly composed of interleaved narrow-band and cross-band blocks to respectively exploit narrow-band and cross-band spatial information. The narrow-band blocks process frequencies independently, and use self-attention mechanism and temporal convolutional layers to respectively perform spatial-feature-based speaker clustering and temporal smoothing/filtering. The cross-band blocks process frames independently, and use full-band linear layer and frequency convolutional layers to respectively learn the correlation between all frequencies and adjacent frequencies. Experiments are conducted on various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
