Flexible Multichannel Speech Enhancement for Noise-Robust Frontend
Ante Juki\'c, Jagadeesh Balam, Boris Ginsburg

TL;DR
This paper introduces a versatile multichannel speech enhancement system that adapts to various microphone configurations, significantly improving noise robustness and automatic speech recognition performance across different setups.
Contribution
It presents a novel neural mask estimator with a transform-attend-concatenate layer for arbitrary microphone arrays, enhancing flexibility and robustness.
Findings
Effective across multiple microphone configurations
Matches fixed-configuration systems in performance
Improves ASR accuracy with randomly-placed microphones
Abstract
This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator applicable to different channel counts and configurations and a multichannel filter with automatic reference selection. A transform-attend-concatenate layer is proposed to handle cross-channel information in the mask estimator, which is shown to be effective for arbitrary microphone configurations. The presented evaluation demonstrates the effectiveness of the flexible system for several seen and unseen compact array geometries, matching the performance of fixed configuration-specific systems. Furthermore, a significantly improved ASR performance is observed for configurations with randomly-placed microphones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
