Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel   Features with Multi-Head Attention

Xinmeng Xu; Rongzhi Gu; Yuexian Zou

arXiv:2205.01280·eess.AS·May 4, 2022

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

Xinmeng Xu, Rongzhi Gu, Yuexian Zou

PDF

TL;DR

This paper introduces a novel dual-microphone speech enhancement model using multi-head cross-attention to better learn cross-channel features, combined with a multi-task SNR estimator and spectral gain for improved noise suppression.

Contribution

The paper proposes a new MHCA-CRN architecture that effectively learns cross-channel features and incorporates a multi-task SNR estimator to reduce speech distortion in dual-microphone speech enhancement.

Findings

01

Outperforms several state-of-the-art models in speech enhancement tasks.

02

Effectively learns mutual relationships between spatial and spectral features.

03

Reduces residual noise and speech distortion.

Abstract

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial and spectral features is hard in the end-to-end DMSE. In this work, a novel architecture for DMSE using a multi-head cross-attention based convolutional recurrent network (MHCA-CRN) is presented. The proposed MHCA-CRN model includes a channel-wise encoding structure for preserving intra-channel features and a multi-head cross-attention mechanism for fully exploiting cross-channel features. In addition, the proposed approach specifically formulates the decoder with an extra SNR estimator to estimate frame-level SNR under a multi-task learning framework, which is expected to avoid speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.