Noisy student-teacher training for robust keyword spotting

Hyun-Jin Park; Pai Zhu; Ignacio Lopez Moreno; Niranjan Subrahmanya

arXiv:2106.01604·cs.LG·June 4, 2021

Noisy student-teacher training for robust keyword spotting

Hyun-Jin Park, Pai Zhu, Ignacio Lopez Moreno, Niranjan Subrahmanya

PDF

TL;DR

This paper introduces a self-training approach with noisy student-teacher methodology for streaming keyword spotting, leveraging large-scale unlabeled data and aggressive spectral augmentation to significantly enhance robustness under challenging conditions.

Contribution

It presents a novel self-training framework that combines noisy student-teacher training with aggressive data augmentation for improved keyword spotting accuracy.

Findings

01

Aggressive spectral augmentation degrades supervised training performance.

02

Self-training with noisy student-teacher improves accuracy on difficult test sets by up to 60%.

03

Method effectively utilizes unlabeled data for robust streaming keyword spotting.

Abstract

We propose self-training with noisy student-teacher approach for streaming keyword spotting, that can utilize large-scale unlabeled data and aggressive data augmentation. The proposed method applies aggressive data augmentation (spectral augmentation) on the input of both student and teacher and utilize unlabeled data at scale, which significantly boosts the accuracy of student against challenging conditions. Such aggressive augmentation usually degrades model performance when used with supervised training with hard-labeled data. Experiments show that aggressive spec augmentation on baseline supervised training method degrades accuracy, while the proposed self-training with noisy student-teacher training improves accuracy of some difficult-conditioned test sets by as much as 60%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.