Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting
Zhehuai Chen, Yanmin Qian, Kai Yu

TL;DR
This paper introduces a sequence discriminative training framework for deep learning based acoustic keyword spotting, significantly improving performance over previous frame-level methods in both fixed vocabulary and unrestricted tasks.
Contribution
It proposes novel sequence discriminative training approaches for acoustic KWS using word-independent lattices and non-keyword symbols, addressing a gap in existing research.
Findings
Achieved consistent performance improvements in fixed vocabulary KWS
Demonstrated significant gains in unrestricted KWS tasks
Validated effectiveness of sequence discriminative training over frame-level methods
Abstract
Speech recognition is a sequence prediction problem. Besides employing various deep learning approaches for framelevel classification, sequence-level discriminative training has been proved to be indispensable to achieve the state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR). However, keyword spotting (KWS), as one of the most common speech recognition tasks, almost only benefits from frame-level deep learning due to the difficulty of getting competing sequence hypotheses. The few studies on sequence discriminative training for KWS are limited for fixed vocabulary or LVCSR based methods and have not been compared to the state-of-the-art deep learning based KWS approaches. In this paper, a sequence discriminative training framework is proposed for both fixed vocabulary and unrestricted acoustic KWS. Sequence discriminative training for both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
