Sequence-to-sequence Models for Small-Footprint Keyword Spotting

Haitong Zhang; Junbo Zhang; Yujun Wang

arXiv:1811.00348·cs.SD·November 2, 2018·6 cites

Sequence-to-sequence Models for Small-Footprint Keyword Spotting

Haitong Zhang, Junbo Zhang, Yujun Wang

PDF

Open Access

TL;DR

This paper introduces a sequence-to-sequence model for keyword spotting that simplifies deployment, achieves high accuracy with low latency, and outperforms existing attention-based models on real-world data.

Contribution

It presents a novel sequence-to-sequence architecture for KWS that is efficient, small-footprint, and surpasses recent attention-based models in performance.

Findings

01

Achieves approximately 3.05% FRR at 0.1 FA/hour with 73K parameters.

02

Outperforms recent attention-based end-to-end models.

03

Demonstrates effectiveness of LSTM and GRU encoders in real-world scenarios.

Abstract

In this paper, we propose a sequence-to-sequence model for keyword spotting (KWS). Compared with other end-to-end architectures for KWS, our model simplifies the pipelines of production-quality KWS system and satisfies the requirement of high accuracy, low-latency, and small-footprint. We also evaluate the performances of different encoder architectures, which include LSTM and GRU. Experiments on the real-world wake-up data show that our approach outperforms the recently proposed attention-based end-to-end model. Specifically speaking, with 73K parameters, our sequence-to-sequence model achieves $\sim$ 3.05\% false rejection rate (FRR) at 0.1 false alarm (FA) per hour.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Advanced Text Analysis Techniques