Neural Architecture Search For Keyword Spotting
Tong Mo, Yakun Yu, Mohammad Salameh, Di Niu, Shangling Jui

TL;DR
This paper employs neural architecture search to design efficient convolutional neural networks that improve keyword spotting accuracy while maintaining manageable memory usage, achieving state-of-the-art results on Google's Speech Commands Dataset.
Contribution
It introduces a differentiable architecture search approach for optimizing CNN models specifically for keyword spotting tasks, balancing performance and resource constraints.
Findings
Achieved over 97% accuracy on 12-class keyword classification.
Demonstrated the effectiveness of neural architecture search in optimizing models for speech tasks.
Produced models with competitive performance and acceptable memory footprint.
Abstract
Deep neural networks have recently become a popular solution to keyword spotting systems, which enable the control of smart devices via voice. In this paper, we apply neural architecture search to search for convolutional neural network models that can help boost the performance of keyword spotting based on features extracted from acoustic signals while maintaining an acceptable memory footprint. Specifically, we use differentiable architecture search techniques to search for operators and their connections in a predefined cell search space. The found cells are then scaled up in both depth and width to achieve competitive performance. We evaluated the proposed method on Google's Speech Commands Dataset and achieved a state-of-the-art accuracy of over 97% on the setting of 12-class utterance classification commonly reported in the literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
