No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting
Yi Liu, Chuan-Che Huang, Xiao Quan

TL;DR
This paper introduces a new benchmark and scoring method to reduce prefix bias in open-vocabulary keyword spotting, significantly improving accuracy while maintaining performance on existing datasets.
Contribution
The authors propose the Partial Overlap Benchmark and Equal-weighting Position Scoring to address prefix bias in OV-KWS, achieving substantial accuracy improvements.
Findings
EPS reduces EER from 64.4% to 29.3% on POB-Spark
Accuracy on POB-LibriPhrase improves from 87.6% to 96.8%
Adding POB data in training enhances overall benchmark performance
Abstract
Open-vocabulary keyword spotting (OV-KWS) enables personalized device control via arbitrary voice commands. Recently, researchers have explored using audio-text joint embeddings, allowing users to enroll phrases with text, and proposed techniques to disambiguate similar utterances. We find that existing OV-KWS solutions often overly bias the beginning phonemes of an enrollment, causing false triggers when negative enrollment-query-pairs share a prefix (``turn the volume up'' vs. ``turn the volume down''). We trace this to two factors: training data bias and position-biased cross-modal scoring. To address these limitations, we introduce the Partial Overlap Benchmark (POB) with two datasets, POB-Spark and POB-LibriPhrase (POB-LP), containing mismatched audio-text pairs with shared prefixes, and propose Equal-weighting Position Scoring (EPS), a lightweight decision layer. Using EPS alone…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
