Noise-Robust Keyword Spotting through Self-supervised Pretraining
Jacob M{\o}rk, Holger Severin Bovbjerg, Gergely Kiss, Zheng-Hua Tan

TL;DR
This paper investigates how self-supervised pretraining, especially Data2Vec, can improve the noise robustness of keyword spotting models, outperforming traditional supervised methods in noisy environments.
Contribution
It demonstrates that self-supervised pretraining, particularly with denoising techniques, enhances KWS robustness in noisy conditions beyond standard supervised training.
Findings
Pretraining with self-supervised methods outperforms supervised training in clean conditions.
Pretraining with noisy data and Data2Vec-denoising significantly improves robustness in noisy environments.
Pretraining alone can surpass multi-style training in certain noisy conditions.
Abstract
Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored. Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
