Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays
Junqi Chen, Xiao-Lei Zhang

TL;DR
This paper introduces a Scaling Sparsemax algorithm for channel selection in large-scale ad-hoc microphone arrays, significantly improving speech recognition accuracy by effectively filtering noisy channels.
Contribution
It proposes a novel Scaling Sparsemax method that replaces Softmax in attention mechanisms, enabling better channel selection in large-scale microphone arrays for speech recognition.
Findings
Over 30% WER reduction on simulation data
Over 20% WER reduction on semi-real data
Effective noise filtering in large-scale arrays
Abstract
Recently, speech recognition with ad-hoc microphone arrays has received much attention. It is known that channel selection is an important problem of ad-hoc microphone arrays, however, this topic seems far from explored in speech recognition yet, particularly with a large-scale ad-hoc microphone array. To address this problem, we propose a Scaling Sparsemax algorithm for the channel selection problem of the speech recognition with large-scale ad-hoc microphone arrays. Specifically, we first replace the conventional Softmax operator in the stream attention mechanism of a multichannel end-to-end speech recognition system with Sparsemax, which conducts channel selection by forcing the channel weights of noisy channels to zero. Because Sparsemax punishes the weights of many channels to zero harshly, we propose Scaling Sparsemax which punishes the channels mildly by setting the weights of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparsemax · Softmax
