Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices

Yael Segal-Feldman; Ann R. Bradlow; Matthew Goldrick; and Joseph Keshet

arXiv:2508.04857·eess.AS·August 8, 2025

Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices

Yael Segal-Feldman, Ann R. Bradlow, Matthew Goldrick, and Joseph Keshet

PDF

TL;DR

This paper presents a novel open-vocabulary keyword spotting model optimized for small devices, leveraging hyper-matched filters and a Perceiver-based detection network to achieve high accuracy and robustness, even in out-of-domain conditions.

Contribution

Introduces a keyword spotting model with hyper-network generated filters and a Perceiver-based detection mechanism, achieving state-of-the-art accuracy on small-footprint devices.

Findings

01

Achieves state-of-the-art detection accuracy.

02

Generalizes well to out-of-domain and L2 speech.

03

Smallest model (4.2M parameters) matches larger models.

Abstract

Open-vocabulary keyword spotting (KWS) refers to the task of detecting words or terms within speech recordings, regardless of whether they were included in the training data. This paper introduces an open-vocabulary keyword spotting model with state-of-the-art detection accuracy for small-footprint devices. The model is composed of a speech encoder, a target keyword encoder, and a detection network. The speech encoder is either a tiny Whisper or a tiny Conformer. The target keyword encoder is implemented as a hyper-network that takes the desired keyword as a character string and generates a unique set of weights for a convolutional layer, which can be considered as a keyword-specific matched filter. The detection network uses the matched-filter weights to perform a keyword-specific convolution, which guides the cross-attention mechanism of a Perceiver module in determining whether the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.