Open-vocabulary Keyword-spotting with Adaptive Instance Normalization
Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

TL;DR
This paper introduces AdaKWS, a novel keyword spotting method that uses keyword-conditioned normalization to improve detection accuracy across multiple languages and low-resource scenarios.
Contribution
The paper presents AdaKWS, a new approach employing a text encoder to generate normalization parameters conditioned on keywords, enhancing open-vocabulary keyword spotting performance.
Findings
Significant improvements over recent baselines in multilingual benchmarks.
Effective performance on low-resource languages unseen during training.
Robust detection across diverse and challenging datasets.
Abstract
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
