Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting
Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming, Sun, Xin Lei, Zhaojun Yang

TL;DR
This paper introduces a datasource-aware disentangled learning approach with adversarial examples to enhance the robustness of small-footprint keyword spotting models across varying acoustic environments, achieving significant improvements in accuracy and false reject rates.
Contribution
It presents a novel adversarial training method that reduces data mismatch issues in keyword spotting, leading to more robust models for on-device speech recognition.
Findings
40.31% reduction in false reject rate at 1% false accept rate
Achieved 98.06% accuracy on Google Speech Commands V1 dataset
Improved robustness against unseen acoustic conditions
Abstract
A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource-aware disentangled learning with adversarial examples to reduce the mismatch between the original and adversarial data as well as the mismatch across original training datasources. The KWS model architecture is based on depth-wise separable convolution and a simple attention module. Experimental results demonstrate that the proposed learning strategy improves false reject rate by at false accept rate on the internal dataset, compared to the strongest baseline without using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUser Authentication and Security Systems · Biometric Identification and Security · Deception detection and forensic psychology
MethodsSoftmax · Attention Is All You Need · Convolution
