Reference Channel Selection by Multi-Channel Masking for End-to-End   Multi-Channel Speech Enhancement

Wang Dai; Xiaofei Li; Archontis Politis; Tuomas Virtanen

arXiv:2406.03228·eess.AS·June 12, 2024·EUSIPCO

Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement

Wang Dai, Xiaofei Li, Archontis Politis, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper proposes an adaptive multi-channel masking method for end-to-end speech enhancement that dynamically selects the optimal reference microphone, improving performance over fixed-reference approaches especially in complex microphone array scenarios.

Contribution

It introduces a novel adaptive reference channel selection technique using multi-channel masking, enhancing flexibility and effectiveness in multi-microphone speech enhancement.

Findings

01

Outperforms fixed-reference methods on Spear challenge dataset

02

Achieves higher SI-SDR scores in simulated environments

03

Demonstrates robustness in varying microphone configurations

Abstract

In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis