AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data
Menglong Xu, Shengqiang Li, Chengdong Liang, Xiao-Lei Zhang

TL;DR
This paper introduces an AUC-based loss function to improve the robustness and accuracy of small-footprint keyword spotting neural networks, especially with limited training data and unseen sounds.
Contribution
It proposes a novel AUC maximization loss function that enhances robustness by optimizing both keyword accuracy and non-keyword detection performance.
Findings
Achieves state-of-the-art results on Google Speech Commands datasets.
Improves detection robustness with limited training data.
Enhances performance on unseen sounds.
Abstract
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, if training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where unseen sounds that are out of the training data are frequently encountered. Most conventional methods aim to maximize the classification accuracy on the training set, without taking the unseen sounds into account. To enhance the robustness of the deep neural networks based KWS, in this paper, we introduce a new loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC). The proposed method not only maximizes the classification accuracy of keywords on the closed training set, but also maximizes the AUC score for optimizing the performance of non-keyword segments detection. Experimental results on the Google Speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
