End-to-end Models with auditory attention in Multi-channel Keyword Spotting
Haitong Zhang, Junbo Zhang, Yujun Wang

TL;DR
This paper introduces an attention-based end-to-end multi-channel keyword spotting model that outperforms traditional methods, especially in noisy environments, by leveraging transfer learning and multi-target spectral mapping.
Contribution
The paper presents a novel attention-based end-to-end model for multi-channel keyword spotting that improves robustness and performance using transfer learning and multi-target spectral mapping.
Findings
Outperforms baseline in clean and noisy data
Transfer learning improves robustness in noisy environments
Achieves 30% higher wake-up rate at 0.1 FA/hour in noisy conditions
Abstract
In this paper, we propose an attention-based end-to-end model for multi-channel keyword spotting (KWS), which is trained to optimize the KWS result directly. As a result, our model outperforms the baseline model with signal pre-processing techniques in both the clean and noisy testing data. We also found that multi-task learning results in a better performance when the training and testing data are similar. Transfer learning and multi-target spectral mapping can dramatically enhance the robustness to the noisy environment. At 0.1 false alarm (FA) per hour, the model with transfer learning and multi-target mapping gain an absolute 30% improvement in the wake-up rate in the noisy data with SNR about -20.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Speech Recognition and Synthesis
