Toward noise-robust whisper keyword spotting on headphones with   in-earcup microphone and curriculum learning

Qiaoyu Yang

arXiv:2502.00295·eess.AS·March 4, 2025

Toward noise-robust whisper keyword spotting on headphones with in-earcup microphone and curriculum learning

Qiaoyu Yang

PDF

Open Access

TL;DR

This paper proposes a noise-robust whisper keyword spotting method for headphones using in-earcup microphones and curriculum learning, significantly improving accuracy in noisy environments.

Contribution

It introduces a novel approach combining multi-microphone processing and curriculum learning to enhance whisper keyword detection on headphones.

Findings

01

F1 score improved by up to 15% in noisy conditions

02

Multi-microphone processing enhances noise robustness

03

Curriculum learning effectively increases whisper keyword detection accuracy

Abstract

The expanding feature set of modern headphones puts a challenge on the design of their control interface. Users may want to separately control each feature or quickly switch between modes that activate different features. Traditional approach of physical buttons may no longer be feasible when the feature set is large. Keyword spotting with voice commands is a promising solution to the issue. Most existing methods of keyword spotting only support commands spoken in a regular voice. However, regular voice may not be desirable in quiet places or public settings. In this paper, we investigate the problem of on-device keyword spotting in whisper voice and explore approaches to improve noise robustness. We leverage the inner microphone on noise-cancellation headphones as an additional source of voice input. We also design a curriculum learning strategy that gradually increases the proportion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsSparse Evolutionary Training