No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

Yi Liu; Chuan-Che Huang; Xiao Quan

arXiv:2602.08930·cs.SD·February 13, 2026

No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting

Yi Liu, Chuan-Che Huang, Xiao Quan

PDF

Open Access

TL;DR

This paper introduces a new benchmark and scoring method to reduce prefix bias in open-vocabulary keyword spotting, significantly improving accuracy while maintaining performance on existing datasets.

Contribution

The authors propose the Partial Overlap Benchmark and Equal-weighting Position Scoring to address prefix bias in OV-KWS, achieving substantial accuracy improvements.

Findings

01

EPS reduces EER from 64.4% to 29.3% on POB-Spark

02

Accuracy on POB-LibriPhrase improves from 87.6% to 96.8%

03

Adding POB data in training enhances overall benchmark performance

Abstract

Open-vocabulary keyword spotting (OV-KWS) enables personalized device control via arbitrary voice commands. Recently, researchers have explored using audio-text joint embeddings, allowing users to enroll phrases with text, and proposed techniques to disambiguate similar utterances. We find that existing OV-KWS solutions often overly bias the beginning phonemes of an enrollment, causing false triggers when negative enrollment-query-pairs share a prefix (``turn the volume up'' vs. ``turn the volume down''). We trace this to two factors: training data bias and position-biased cross-modal scoring. To address these limitations, we introduce the Partial Overlap Benchmark (POB) with two datasets, POB-Spark and POB-LibriPhrase (POB-LP), containing mismatched audio-text pairs with shared prefixes, and propose Equal-weighting Position Scoring (EPS), a lightweight decision layer. Using EPS alone…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems