PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined   Keywords

Yong-Hyeok Lee; Namhyun Cho

arXiv:2308.16511·eess.AS·September 1, 2023

PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords

Yong-Hyeok Lee, Namhyun Cho

PDF

2 Repos

TL;DR

PhonMatchNet introduces a zero-shot keyword spotting model leveraging phoneme information, outperforming baselines and rivaling full-shot models across diverse pronunciation scenarios.

Contribution

It proposes a novel two-stream architecture with phoneme-level detection, enhancing zero-shot keyword spotting performance in various pronunciation environments.

Findings

01

Significant reduction in EER and AUC metrics across datasets.

02

Outperforms baseline models and rivals full-shot models.

03

Effective in recognizing proper nouns and indistinguishable pronunciations.

Abstract

This study presents a novel zero-shot user-defined keyword spotting model that utilizes the audio-phoneme relationship of the keyword to improve performance. Unlike the previous approach that estimates at utterance level, we use both utterance and phoneme level information. Our proposed method comprises a two-stream speech encoder architecture, self-attention-based pattern extractor, and phoneme-level detection loss for high performance in various pronunciation environments. Based on experimental results, our proposed model outperforms the baseline model and achieves competitive performance compared with full-shot keyword spotting models. Our proposed model significantly improves the EER and AUC across all datasets, including familiar words, proper nouns, and indistinguishable pronunciations, with an average relative improvement of 67% and 80%, respectively. The implementation code of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.