Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

Ante Juki\'c; Jagadeesh Balam; Boris Ginsburg

arXiv:2406.04552·eess.AS·June 10, 2024·WASPAA

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

Ante Juki\'c, Jagadeesh Balam, Boris Ginsburg

PDF

Open Access

TL;DR

This paper introduces a versatile multichannel speech enhancement system that adapts to various microphone configurations, significantly improving noise robustness and automatic speech recognition performance across different setups.

Contribution

It presents a novel neural mask estimator with a transform-attend-concatenate layer for arbitrary microphone arrays, enhancing flexibility and robustness.

Findings

01

Effective across multiple microphone configurations

02

Matches fixed-configuration systems in performance

03

Improves ASR accuracy with randomly-placed microphones

Abstract

This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator applicable to different channel counts and configurations and a multichannel filter with automatic reference selection. A transform-attend-concatenate layer is proposed to handle cross-channel information in the mask estimator, which is shown to be effective for arbitrary microphone configurations. The presented evaluation demonstrates the effectiveness of the flexible system for several seen and unseen compact array geometries, matching the performance of fixed configuration-specific systems. Furthermore, a significantly improved ASR performance is observed for configurations with randomly-placed microphones.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development