Attention-Based Beamformer For Multi-Channel Speech Enhancement

Jinglin Bai; Hao Li; Xueliang Zhang; Fei Chen

arXiv:2409.06456·cs.SD·September 16, 2024

Attention-Based Beamformer For Multi-Channel Speech Enhancement

Jinglin Bai, Hao Li, Xueliang Zhang, Fei Chen

PDF

Open Access

TL;DR

This paper introduces an attention-based approach to improve multi-channel speech enhancement by dynamically estimating spatial covariance matrices, effectively handling moving sources and outperforming traditional methods.

Contribution

It proposes a novel attention-based mechanism for SCM estimation in MVDR beamforming, incorporating spatial information with inplace convolution and frequency-independent LSTM, optimized end-to-end.

Findings

01

Outperforms baseline methods in speech enhancement quality.

02

Reduces computational complexity and model parameters.

03

Effectively handles moving sources in multi-channel scenarios.

Abstract

Minimum Variance Distortionless Response (MVDR) is a classical adaptive beamformer that theoretically ensures the distortionless transmission of signals in the target direction, which makes it popular in real applications. Its noise reduction performance actually depends on the accuracy of the noise and speech spatial covariance matrices (SCMs) estimation. Time-frequency masks are often used to compute these SCMs. However, most mask-based beamforming methods typically assume that the sources are stationary, ignoring the case of moving sources, which leads to performance degradation. In this paper, we propose an attention-based mechanism to calculate the speech and noise SCMs and then apply MVDR to obtain the enhanced speech. To fully incorporate spatial information, the inplace convolution operator and frequency-independent LSTM are applied to facilitate SCMs estimation. The model is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis