Very Fast Keyword Spotting System with Real Time Factor below 0.01

Jan Nouza; Petr Cerva; Jindrich Zdansky

arXiv:2007.10706·eess.AS·September 9, 2020

Very Fast Keyword Spotting System with Real Time Factor below 0.01

Jan Nouza, Petr Cerva, Jindrich Zdansky

PDF

TL;DR

This paper introduces a highly optimized neural network-based keyword spotting system capable of real-time operation with a factor below 0.01, suitable for various speech data types.

Contribution

The paper presents a novel, highly efficient architecture for keyword spotting using bidirectional feedforward networks and forward decoding, achieving unprecedented speed.

Findings

01

RT factor close to 0.001 with all optimizations

02

Effective on diverse Czech speech datasets

03

Outperforms previous systems in speed and efficiency

Abstract

In the paper we present an architecture of a keyword spotting (KWS) system that is based on modern neural networks, yields good performance on various types of speech data and can run very fast. We focus mainly on the last aspect and propose optimizations for all the steps required in a KWS design: signal processing and likelihood computation, Viterbi decoding, spot candidate detection and confidence calculation. We present time and memory efficient modelling by bidirectional feedforward sequential memory networks (an alternative to recurrent nets) either by standard triphones or so called quasi-monophones, and an entirely forward decoding of speech frames (with minimal need for look back). Several variants of the proposed scheme are evaluated on 3 large Czech datasets (broadcast, internet and telephone, 17 hours in total) and their performance is compared by Detection Error Tradeoff…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.