Sequence-to-sequence Models for Small-Footprint Keyword Spotting
Haitong Zhang, Junbo Zhang, Yujun Wang

TL;DR
This paper introduces a sequence-to-sequence model for keyword spotting that simplifies deployment, achieves high accuracy with low latency, and outperforms existing attention-based models on real-world data.
Contribution
It presents a novel sequence-to-sequence architecture for KWS that is efficient, small-footprint, and surpasses recent attention-based models in performance.
Findings
Achieves approximately 3.05% FRR at 0.1 FA/hour with 73K parameters.
Outperforms recent attention-based end-to-end models.
Demonstrates effectiveness of LSTM and GRU encoders in real-world scenarios.
Abstract
In this paper, we propose a sequence-to-sequence model for keyword spotting (KWS). Compared with other end-to-end architectures for KWS, our model simplifies the pipelines of production-quality KWS system and satisfies the requirement of high accuracy, low-latency, and small-footprint. We also evaluate the performances of different encoder architectures, which include LSTM and GRU. Experiments on the real-world wake-up data show that our approach outperforms the recently proposed attention-based end-to-end model. Specifically speaking, with 73K parameters, our sequence-to-sequence model achieves 3.05\% false rejection rate (FRR) at 0.1 false alarm (FA) per hour.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Advanced Text Analysis Techniques
