iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation
Yong-Hyeok Lee, Namhyun Cho

TL;DR
This paper presents iPhonMatchNet, a zero-shot keyword spotting model that uses implicit acoustic echo cancellation to effectively handle overlapping speech and playback audio in smart devices, achieving high accuracy with minimal model size increase.
Contribution
The paper introduces a novel zero-shot keyword spotting approach leveraging implicit acoustic echo cancellation, improving performance in barge-in scenarios without needing clean signals.
Findings
Achieves 95% reduction in mean absolute error.
Model size increases by only 0.13% over baseline.
Demonstrates effective real-world deployment performance.
Abstract
In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to increase the efficiency of user-defined keyword spotting models, achieving a remarkable 95% reduction in mean absolute error with a minimal increase in model size (0.13%) compared to the baseline model, PhonMatchNet. We also present an efficient model structure and demonstrate its capability to learn iAEC functionality without requiring a clean signal. The findings of our study indicate that the proposed model achieves competitive performance in real-world deployment conditions of smart devices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
