Noise-Robust Keyword Spotting through Self-supervised Pretraining

Jacob M{\o}rk; Holger Severin Bovbjerg; Gergely Kiss; Zheng-Hua Tan

arXiv:2403.18560·eess.AS·March 28, 2024·1 cites

Noise-Robust Keyword Spotting through Self-supervised Pretraining

Jacob M{\o}rk, Holger Severin Bovbjerg, Gergely Kiss, Zheng-Hua Tan

PDF

Open Access 1 Repo

TL;DR

This paper investigates how self-supervised pretraining, especially Data2Vec, can improve the noise robustness of keyword spotting models, outperforming traditional supervised methods in noisy environments.

Contribution

It demonstrates that self-supervised pretraining, particularly with denoising techniques, enhances KWS robustness in noisy conditions beyond standard supervised training.

Findings

01

Pretraining with self-supervised methods outperforms supervised training in clean conditions.

02

Pretraining with noisy data and Data2Vec-denoising significantly improves robustness in noisy environments.

03

Pretraining alone can surpass multi-style training in certain noisy conditions.

Abstract

Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored. Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aau-es-ml/ssl_noise-robust_kws
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques