Speech Enhancement for Wake-Up-Word detection in Voice Assistants

David Bonet; Guillermo C\'ambara; Fernando L\'opez; Pablo G\'omez,; Carlos Segura; Jordi Luque

arXiv:2101.12732·eess.AS·February 1, 2021

Speech Enhancement for Wake-Up-Word detection in Voice Assistants

David Bonet, Guillermo C\'ambara, Fernando L\'opez, Pablo G\'omez,, Carlos Segura, Jordi Luque

PDF

TL;DR

This paper introduces a waveform-level speech enhancement model tailored for wake-up-word detection in voice assistants, significantly improving noise robustness without harming performance in quiet settings.

Contribution

It presents a fully convolutional denoising auto-encoder trained jointly with a WUW classifier, optimized for challenging noisy environments.

Findings

01

Enhanced detection accuracy in noisy conditions

02

No negative impact on quiet environment performance

03

Joint training improves robustness significantly

Abstract

Keyword spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants. A very common issue of voice assistants is that they get easily activated by background noise like music, TV or background speech that accidentally triggers the device. In this paper, we propose a Speech Enhancement (SE) model adapted to the task of WUW detection that aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises. The SE model is a fully-convolutional denoising auto-encoder at waveform level and is trained using a log-Mel Spectrogram and waveform reconstruction losses together with the BCE loss of a simple WUW classification network. A new database has been purposely prepared for the task of recognizing the WUW in challenging conditions containing negative samples that are very phonetically similar to the keyword.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.