Disentangled Training with Adversarial Examples For Robust   Small-footprint Keyword Spotting

Zhenyu Wang; Li Wan; Biqiao Zhang; Yiteng Huang; Shang-Wen Li; Ming; Sun; Xin Lei; Zhaojun Yang

arXiv:2408.13355·cs.SD·August 27, 2024

Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting

Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming, Sun, Xin Lei, Zhaojun Yang

PDF

Open Access

TL;DR

This paper introduces a datasource-aware disentangled learning approach with adversarial examples to enhance the robustness of small-footprint keyword spotting models across varying acoustic environments, achieving significant improvements in accuracy and false reject rates.

Contribution

It presents a novel adversarial training method that reduces data mismatch issues in keyword spotting, leading to more robust models for on-device speech recognition.

Findings

01

40.31% reduction in false reject rate at 1% false accept rate

02

Achieved 98.06% accuracy on Google Speech Commands V1 dataset

03

Improved robustness against unseen acoustic conditions

Abstract

A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource-aware disentangled learning with adversarial examples to reduce the mismatch between the original and adversarial data as well as the mismatch across original training datasources. The KWS model architecture is based on depth-wise separable convolution and a simple attention module. Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31$ at $1$ false accept rate on the internal dataset, compared to the strongest baseline without using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUser Authentication and Security Systems · Biometric Identification and Security · Deception detection and forensic psychology

MethodsSoftmax · Attention Is All You Need · Convolution