Frequency & Channel Attention Network for Small Footprint Noisy Spoken Keyword Spotting
Yuanxi Lin, Yuriy Evgenyevich Gapanyuk

TL;DR
This paper introduces FCA-Net, a lightweight CNN with a novel attention module and training strategy, significantly improving noisy keyword spotting accuracy while maintaining a small model size.
Contribution
The paper presents FCA-Net, a new CNN architecture with a channel and frequency-specific attention module and curriculum training, enhancing noise robustness in small-footprint KWS systems.
Findings
Outperforms state-of-the-art small-footprint KWS in noisy environments
Uses a novel attention module for channel and frequency-specific feature weighting
Employs curriculum-based multi-condition training for robustness
Abstract
In this paper, we aim to improve the robustness of Keyword Spotting (KWS) systems in noisy environments while keeping a small memory footprint. We propose a new convolutional neural network (CNN) called FCA-Net, which combines mixer unit-based feature interaction with a two-dimensional convolution-based attention module. First, we introduce and compare lightweight attention methods to enhance noise robustness in CNN. Then, we propose an attention module that creates fine-grained attention weights to capture channel and frequency-specific information, boosting the model's ability to handle noisy conditions. By combining the mixer unit-based feature interaction with the attention module, we enhance performance. Additionally, we use a curriculum-based multi-condition training strategy. Our experiments show that our system outperforms current state-of-the-art solutions for small-footprint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Text and Document Classification Technologies · Speech and Audio Processing
MethodsSoftmax · Attention Is All You Need
