Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting
Beltr\'an Labrador, Guanlong Zhao, Ignacio L\'opez Moreno, Angelo, Scorza Scarpati, Liam Fowl, Quan Wang

TL;DR
This paper introduces a novel sequence-to-sequence Transformer-Transducer model for keyword spotting that replaces keywords with a special token and uses a specialized loss, outperforming traditional KWS systems.
Contribution
The paper adapts Transformer-Transducer ASR models for KWS by replacing keywords with a token and employing a new loss, enhancing flexibility and performance.
Findings
Outperforms traditional ASR-based KWS systems
Achieves similar performance to conventional KWS with added flexibility
Can improve existing KWS systems when combined
Abstract
In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token <kw> and training the system to detect the <kw> token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
