Sequence Discriminative Training for Deep Learning based Acoustic   Keyword Spotting

Zhehuai Chen; Yanmin Qian; Kai Yu

arXiv:1808.00639·cs.CL·August 20, 2018

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting

Zhehuai Chen, Yanmin Qian, Kai Yu

PDF

TL;DR

This paper introduces a sequence discriminative training framework for deep learning based acoustic keyword spotting, significantly improving performance over previous frame-level methods in both fixed vocabulary and unrestricted tasks.

Contribution

It proposes novel sequence discriminative training approaches for acoustic KWS using word-independent lattices and non-keyword symbols, addressing a gap in existing research.

Findings

01

Achieved consistent performance improvements in fixed vocabulary KWS

02

Demonstrated significant gains in unrestricted KWS tasks

03

Validated effectiveness of sequence discriminative training over frame-level methods

Abstract

Speech recognition is a sequence prediction problem. Besides employing various deep learning approaches for framelevel classification, sequence-level discriminative training has been proved to be indispensable to achieve the state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR). However, keyword spotting (KWS), as one of the most common speech recognition tasks, almost only benefits from frame-level deep learning due to the difficulty of getting competing sequence hypotheses. The few studies on sequence discriminative training for KWS are limited for fixed vocabulary or LVCSR based methods and have not been compared to the state-of-the-art deep learning based KWS approaches. In this paper, a sequence discriminative training framework is proposed for both fixed vocabulary and unrestricted acoustic KWS. Sequence discriminative training for both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.