On Front-end Gain Invariant Modeling for Wake Word Spotting

Yixin Gao; Noah D. Stein; Chieh-Chi Kao; Yunliang Cai; Ming Sun; Tao; Zhang; Shiv Vitaladevuni

arXiv:2010.06676·eess.AS·October 15, 2020

On Front-end Gain Invariant Modeling for Wake Word Spotting

Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao, Zhang, Shiv Vitaladevuni

PDF

Open Access

TL;DR

This paper introduces a $ riangle$LFBE feature for wake word spotting that effectively decouples front-end gain variations, enhancing robustness across different devices and acoustic conditions.

Contribution

The paper proposes a novel $ riangle$LFBE feature and neural network modifications to improve wake word spotting robustness against AFE gain variations.

Findings

01

$ riangle$LFBE maintains performance with up to $ extpm$12dB gain changes.

02

Baseline CNN model's false alarm rate increases by 19.0% without $ riangle$LFBE.

03

$ riangle$LFBE-based models show no performance loss under gain variations.

Abstract

Wake word (WW) spotting is challenging in far-field due to the complexities and variations in acoustic conditions and the environmental interference in signal transmission. A suite of carefully designed and optimized audio front-end (AFE) algorithms help mitigate these challenges and provide better quality audio signals to the downstream modules such as WW spotter. Since the WW model is trained with the AFE-processed audio data, its performance is sensitive to AFE variations, such as gain changes. In addition, when deploying to new devices, the WW performance is not guaranteed because the AFE is unknown to the WW model. To address these issues, we propose a novel approach to use a new feature called $Δ$ LFBE to decouple the AFE gain variations from the WW model. We modified the neural network architectures to accommodate the delta computation, with the feature extraction module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis