Hello Edge: Keyword Spotting on Microcontrollers
Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra

TL;DR
This paper evaluates and optimizes neural network architectures for keyword spotting on microcontrollers, demonstrating that high accuracy can be achieved within strict resource constraints.
Contribution
It provides a comprehensive comparison of neural network architectures for KWS on microcontrollers and shows how to optimize them for accuracy and resource efficiency.
Findings
DS-CNN achieves 95.4% accuracy, outperforming similar models.
Neural networks can be optimized to fit microcontroller constraints.
Optimized architectures maintain high accuracy within limited memory and compute.
Abstract
Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience. Recently, neural networks have become an attractive choice for KWS architecture because of their superior accuracy compared to traditional speech processing algorithms. Due to its always-on nature, KWS application has highly constrained power budget and typically runs on tiny microcontrollers with limited memory and compute capability. The design of neural network architecture for KWS must consider these constraints. In this work, we perform neural network architecture evaluation and exploration for running KWS on resource-constrained microcontrollers. We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Speech and dialogue systems
MethodsJamesh smith · Dilated Convolution · Batch Normalization · Residual Connection
