Data Augmentation for Robust Keyword Spotting under Playback   Interference

Anirudh Raju; Sankaran Panchapagesan; Xing Liu; Arindam Mandal; Nikko; Strom

arXiv:1808.00563·cs.CL·August 3, 2018·31 cites

Data Augmentation for Robust Keyword Spotting under Playback Interference

Anirudh Raju, Sankaran Panchapagesan, Xing Liu, Arindam Mandal, Nikko, Strom

PDF

Open Access

TL;DR

This paper introduces a data augmentation method that improves on-device keyword spotting accuracy in noisy, real-world environments with playback interference by artificially adding background sounds during training.

Contribution

It proposes a novel data augmentation strategy that enhances keyword spotting robustness against ambient noise and residual echo in real-world scenarios.

Findings

01

30-45% reduction in false reject rates

02

Improved robustness under playback interference

03

Effective in real-world conditions

Abstract

Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents. It is particularly challenging to maintain low false reject rate in real world conditions where there is (a) ambient noise from external sources such as TV, household appliances, or other speech that is not directed at the device (b) imperfect cancellation of the audio playback from the device, resulting in residual echo, after being processed by the Acoustic Echo Cancellation (AEC) system. In this paper, we propose a data augmentation strategy to improve keyword spotting performance under these challenging conditions. The training set audio is artificially corrupted by mixing in music and TV/movie audio, at different signal to interference ratios. Our results show that we get around 30-45% relative reduction in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques