Noisy student-teacher training for robust keyword spotting
Hyun-Jin Park, Pai Zhu, Ignacio Lopez Moreno, Niranjan Subrahmanya

TL;DR
This paper introduces a self-training approach with noisy student-teacher methodology for streaming keyword spotting, leveraging large-scale unlabeled data and aggressive spectral augmentation to significantly enhance robustness under challenging conditions.
Contribution
It presents a novel self-training framework that combines noisy student-teacher training with aggressive data augmentation for improved keyword spotting accuracy.
Findings
Aggressive spectral augmentation degrades supervised training performance.
Self-training with noisy student-teacher improves accuracy on difficult test sets by up to 60%.
Method effectively utilizes unlabeled data for robust streaming keyword spotting.
Abstract
We propose self-training with noisy student-teacher approach for streaming keyword spotting, that can utilize large-scale unlabeled data and aggressive data augmentation. The proposed method applies aggressive data augmentation (spectral augmentation) on the input of both student and teacher and utilize unlabeled data at scale, which significantly boosts the accuracy of student against challenging conditions. Such aggressive augmentation usually degrades model performance when used with supervised training with hard-labeled data. Experiments show that aggressive spec augmentation on baseline supervised training method degrades accuracy, while the proposed self-training with noisy student-teacher training improves accuracy of some difficult-conditioned test sets by as much as 60%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
