Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning
Zhenyu Wang, Shuyu Kong, Li Wan, Biqiao Zhang, Yiteng Huang, Mumin, Jin, Ming Sun, Xin Lei, Zhaojun Yang

TL;DR
This paper introduces a novel Query-by-Example keyword spotting system utilizing spectral-temporal graph attentive pooling and multi-task learning to recognize customized keywords efficiently and effectively.
Contribution
It proposes a new QbyE KWS framework with spectral-temporal graph attentive pooling and compares three encoder architectures, highlighting LiCoNet's efficiency and performance.
Findings
LiCoNet achieves comparable accuracy to Conformer with much higher efficiency.
The proposed framework improves speaker-invariant and linguistic-informative embeddings.
Experimental results on a large dataset validate the effectiveness of the approach.
Abstract
Existing keyword spotting (KWS) systems primarily rely on predefined keyword phrases. However, the ability to recognize customized keywords is crucial for tailoring interactions with intelligent devices. In this paper, we present a novel Query-by-Example (QbyE) KWS system that employs spectral-temporal graph attentive pooling and multi-task learning. This framework aims to effectively learn speaker-invariant and linguistic-informative embeddings for QbyE KWS tasks. Within this framework, we investigate three distinct network architectures for encoder modeling: LiCoNet, Conformer and ECAPA_TDNN. The experimental results on a substantial internal dataset of speakers have demonstrated the effectiveness of the proposed QbyE framework in maximizing the potential of simpler models such as LiCoNet. Particularly, LiCoNet, which is 13x more efficient, achieves comparable performance to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Complex Network Analysis Techniques
