McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

Yujie Yang; Changsheng Quan; Xiaofei Li

arXiv:2211.08872·eess.AS·November 17, 2022·1 cites

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

Yujie Yang, Changsheng Quan, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces McNet, a multi-cue fusion network that combines spectral and spatial information across multiple modules to improve multichannel speech enhancement, significantly outperforming existing methods.

Contribution

The paper presents a novel multi-cue fusion network architecture that effectively integrates spectral and spatial cues for enhanced speech separation.

Findings

01

Each module contributes uniquely to performance.

02

McNet outperforms state-of-the-art methods.

03

Effective exploitation of spectral and spatial cues improves results.

Abstract

In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-westlakeu/mcnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques