Scaling sparsemax based channel selection for speech recognition with   ad-hoc microphone arrays

Junqi Chen; Xiao-Lei Zhang

arXiv:2103.15305·eess.AS·February 15, 2022

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Junqi Chen, Xiao-Lei Zhang

PDF

TL;DR

This paper introduces a Scaling Sparsemax algorithm for channel selection in large-scale ad-hoc microphone arrays, significantly improving speech recognition accuracy by effectively filtering noisy channels.

Contribution

It proposes a novel Scaling Sparsemax method that replaces Softmax in attention mechanisms, enabling better channel selection in large-scale microphone arrays for speech recognition.

Findings

01

Over 30% WER reduction on simulation data

02

Over 20% WER reduction on semi-real data

03

Effective noise filtering in large-scale arrays

Abstract

Recently, speech recognition with ad-hoc microphone arrays has received much attention. It is known that channel selection is an important problem of ad-hoc microphone arrays, however, this topic seems far from explored in speech recognition yet, particularly with a large-scale ad-hoc microphone array. To address this problem, we propose a Scaling Sparsemax algorithm for the channel selection problem of the speech recognition with large-scale ad-hoc microphone arrays. Specifically, we first replace the conventional Softmax operator in the stream attention mechanism of a multichannel end-to-end speech recognition system with Sparsemax, which conducts channel selection by forcing the channel weights of noisy channels to zero. Because Sparsemax punishes the weights of many channels to zero harshly, we propose Scaling Sparsemax which punishes the channels mildly by setting the weights of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparsemax · Softmax