Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin, Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha

TL;DR
This paper introduces a temporal convolution approach with a compact ResNet architecture for real-time keyword spotting on mobile devices, achieving significant speedups and improved accuracy over existing models.
Contribution
The paper proposes a novel temporal convolution method with a compact ResNet architecture specifically designed for low-latency keyword spotting on mobile devices.
Findings
Over 385x speedup on Google Pixel 1
Surpasses state-of-the-art accuracy
Provides an end-to-end training and evaluation pipeline
Abstract
Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
