McNet: Fuse Multiple Cues for Multichannel Speech Enhancement
Yujie Yang, Changsheng Quan, Xiaofei Li

TL;DR
This paper introduces McNet, a multi-cue fusion network that combines spectral and spatial information across multiple modules to improve multichannel speech enhancement, significantly outperforming existing methods.
Contribution
The paper presents a novel multi-cue fusion network architecture that effectively integrates spectral and spatial cues for enhanced speech separation.
Findings
Each module contributes uniquely to performance.
McNet outperforms state-of-the-art methods.
Effective exploitation of spectral and spatial cues improves results.
Abstract
In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
