Embedding and Beamforming: All-neural Causal Beamformer for Multichannel   Speech Enhancement

Andong Li; Wenzhe Liu; Chengshi Zheng; Xiaodong Li

arXiv:2109.00265·cs.SD·September 3, 2021·1 cites

Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement

Andong Li, Wenzhe Liu, Chengshi Zheng, Xiaodong Li

PDF

Open Access

TL;DR

This paper introduces a novel all-neural causal beamformer for multichannel speech enhancement, leveraging deep learning to learn spatial embeddings and directly derive beamforming weights, resulting in significant performance improvements.

Contribution

The paper proposes a new neural causal beamformer paradigm with two core modules, EM and BM, that learn spatial embeddings and directly compute beamforming weights, advancing beyond traditional covariance matrix estimation.

Findings

01

Outperforms previous baselines significantly in multiple metrics

02

Effective suppression of residual noise demonstrated

03

Utilizes DNS-Challenge dataset for comprehensive evaluation

Abstract

The spatial covariance matrix has been considered to be significant for beamformers. Standing upon the intersection of traditional beamformers and deep neural networks, we propose a causal neural beamformer paradigm called Embedding and Beamforming, and two core modules are designed accordingly, namely EM and BM. For EM, instead of estimating spatial covariance matrix explicitly, the 3-D embedding tensor is learned with the network, where both spectral and spatial discriminative information can be represented. For BM, a network is directly leveraged to derive the beamforming weights so as to implement filter-and-sum operation. To further improve the speech quality, a post-processing module is introduced to further suppress the residual noise. Based on the DNS-Challenge dataset, we conduct the experiments for multichannel speech enhancement and the results show that the proposed system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Structural Health Monitoring Techniques