WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit
Jie Wang, Menglong Xu, Jingyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei, Xie, Fuping Pan

TL;DR
WeKws is a production-ready, efficient end-to-end keyword spotting toolkit that simplifies training and deployment, achieving competitive results on multiple datasets for real-world speech interaction applications.
Contribution
It introduces a practical, easy-to-use E2E KWS toolkit with a refined max-pooling loss for better keyword boundary detection, bridging research and deployment gaps.
Findings
Achieves high accuracy on three public datasets.
Simplifies training with a refined max-pooling loss.
Enables efficient real-world deployment.
Abstract
Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Speech and dialogue systems · Topic Modeling
