Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
Wang Dai, Xiaofei Li, Archontis Politis, Tuomas Virtanen

TL;DR
This paper proposes an adaptive multi-channel masking method for end-to-end speech enhancement that dynamically selects the optimal reference microphone, improving performance over fixed-reference approaches especially in complex microphone array scenarios.
Contribution
It introduces a novel adaptive reference channel selection technique using multi-channel masking, enhancing flexibility and effectiveness in multi-microphone speech enhancement.
Findings
Outperforms fixed-reference methods on Spear challenge dataset
Achieves higher SI-SDR scores in simulated environments
Demonstrates robustness in varying microphone configurations
Abstract
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis
