On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement
Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann

TL;DR
This paper investigates how deep neural networks process spatial, spectral, and temporal information for multi-channel speech enhancement, highlighting the importance of non-linear spatial filtering and joint processing for improved performance.
Contribution
The study provides experimental insights into the internal mechanisms of DNN-based non-linear filters, emphasizing the benefits of joint spatial, spectral, and temporal processing.
Findings
Non-linear spatial filtering outperforms linear filters by 0.24 POLQA score.
Joint processing of spectral and temporal information yields a 0.4 POLQA score improvement.
Non-linear spatial filtering is crucial for effective speech enhancement.
Abstract
Employing deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages over a traditional approach combining a linear spatial filter with an independent tempo-spectral post-filter: 1) non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model and 2) joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information. A variety of DNN-based non-linear filters have been proposed recently, for which good enhancement performance is reported. However, little is known about the internal mechanisms which turns network architecture design into a game of chance. Therefore, in this paper, we perform experiments to better understand the internal processing of spatial, spectral and temporal information by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Advanced Adaptive Filtering Techniques
