Multichannel Speech Enhancement without Beamforming
Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia and, DeLiang Wang

TL;DR
This paper introduces a two-stage multichannel speech enhancement method that achieves high performance without relying on traditional beamforming, by combining a novel neural network with a recurrent network.
Contribution
It proposes a novel attentive dense convolutional network for spectrogram estimation and demonstrates that a two-stage approach with a stronger first model outperforms traditional beamforming methods.
Findings
State-of-the-art results with ADCN in single-stage models
Two-stage approach improves speech enhancement performance
Effective without traditional beamforming techniques
Abstract
Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, combining them with a traditional beamformer and a DNN-based post-filter in a multistage processing provides additional improvements. In this work, we propose a two-stage strategy for multi-channel speech enhancement that does not require a traditional beamformer for additional performance. First, we propose a novel attentive dense convolutional network (ADCN) for estimating real and imaginary parts of complex spectrogram. ADCN obtains state-of-the-art results among single-stage models. Next, we use ADCN with a recently proposed triple-path attentive recurrent network (TPARN) for estimating waveform samples. The proposed strategy uses two insights; first,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis
