AutoKWS: Keyword Spotting with Differentiable Architecture Search

Bo Zhang; Wenfeng Li; Qingyuan Li; Weiji Zhuang; Xiangxiang Chu; Yujun; Wang

arXiv:2009.03658·eess.AS·February 23, 2021·6 cites

AutoKWS: Keyword Spotting with Differentiable Architecture Search

Bo Zhang, Wenfeng Li, Qingyuan Li, Weiji Zhuang, Xiangxiang Chu, Yujun, Wang

PDF

Open Access

TL;DR

This paper introduces AutoKWS, a differentiable neural architecture search method for keyword spotting that finds efficient models balancing accuracy and latency, achieving high performance with fewer parameters.

Contribution

It applies differentiable architecture search to optimize lightweight keyword spotting models, surpassing human-designed architectures in efficiency and accuracy.

Findings

01

Achieves 97.2% accuracy on Google Speech Command Dataset v1.

02

Uses nearly 100K parameters, demonstrating model efficiency.

03

Outperforms traditional manually designed models.

Abstract

Smart audio devices are gated by an always-on lightweight keyword spotting program to reduce power consumption. It is however challenging to design models that have both high accuracy and low latency for accurate and fast responsiveness. Many efforts have been made to develop end-to-end neural networks, in which depthwise separable convolutions, temporal convolutions, and LSTMs are adopted as building units. Nonetheless, these networks designed with human expertise may not achieve an optimal trade-off in an expansive search space. In this paper, we propose to leverage recent advances in differentiable neural architecture search to discover more efficient networks. Our searched model attains 97.2% top-1 accuracy on Google Speech Command Dataset v1 with only nearly 100K parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis