Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

Aviv Navon; Aviv Shamsian; Neta Glazer; Gill Hetz; Joseph Keshet

arXiv:2309.08561·eess.AS·September 18, 2023

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

PDF

Open Access

TL;DR

This paper introduces AdaKWS, a novel keyword spotting method that uses keyword-conditioned normalization to improve detection accuracy across multiple languages and low-resource scenarios.

Contribution

The paper presents AdaKWS, a new approach employing a text encoder to generate normalization parameters conditioned on keywords, enhancing open-vocabulary keyword spotting performance.

Findings

01

Significant improvements over recent baselines in multilingual benchmarks.

02

Effective performance on low-resource languages unseen during training.

03

Robust detection across diverse and challenging datasets.

Abstract

Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing