Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain
Dejan Markovic, Alexandre Defossez, Alexander Richard

TL;DR
This paper introduces an end-to-end neural model for multichannel source separation that implicitly performs spatial filtering directly in the waveform domain, effectively separating moving sound sources without traditional spatial features.
Contribution
The proposed model is a novel single-stage causal waveform-to-waveform neural network that performs implicit spatial filtering for dynamic multichannel source separation.
Findings
Matches performance of oracle beamformer plus single-channel enhancement
Effective in dynamic acoustic scenes with moving sources
Operates without traditional spatial features or processing components
Abstract
We present a single-stage casual waveform-to-waveform multichannel model that can separate moving sound sources based on their broad spatial locations in a dynamic acoustic scene. We divide the scene into two spatial regions containing, respectively, the target and the interfering sound sources. The model is trained end-to-end and performs spatial processing implicitly, without any components based on traditional processing or use of hand-crafted spatial features. We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer followed by a state-of-the-art single-channel enhancement network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing
