Multichannel Speech Enhancement without Beamforming

Asutosh Pandey; Buye Xu; Anurag Kumar; Jacob Donley; Paul Calamia and; DeLiang Wang

arXiv:2110.13130·cs.SD·April 7, 2022

Multichannel Speech Enhancement without Beamforming

Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia and, DeLiang Wang

PDF

Open Access

TL;DR

This paper introduces a two-stage multichannel speech enhancement method that achieves high performance without relying on traditional beamforming, by combining a novel neural network with a recurrent network.

Contribution

It proposes a novel attentive dense convolutional network for spectrogram estimation and demonstrates that a two-stage approach with a stronger first model outperforms traditional beamforming methods.

Findings

01

State-of-the-art results with ADCN in single-stage models

02

Two-stage approach improves speech enhancement performance

03

Effective without traditional beamforming techniques

Abstract

Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, combining them with a traditional beamformer and a DNN-based post-filter in a multistage processing provides additional improvements. In this work, we propose a two-stage strategy for multi-channel speech enhancement that does not require a traditional beamformer for additional performance. First, we propose a novel attentive dense convolutional network (ADCN) for estimating real and imaginary parts of complex spectrogram. ADCN obtains state-of-the-art results among single-stage models. Next, we use ADCN with a recently proposed triple-path attentive recurrent network (TPARN) for estimating waveform samples. The proposed strategy uses two insights; first,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis