Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting
Byeonggeun Kim, Seunghan Yang, Inseop Chung, Simyung Chang

TL;DR
This paper introduces Dummy Prototypical Networks, a novel approach for few-shot open-set keyword spotting that improves open-set detection and is validated on both speech and image benchmarks.
Contribution
The paper proposes Dummy Prototypical Networks, a new metric learning method that enhances open-set detection in few-shot scenarios for keyword spotting and image recognition.
Findings
D-ProtoNets outperform recent FSOSR methods on splitGSC.
D-ProtoNets achieve state-of-the-art open-set detection on miniImageNet.
The approach effectively combines few-shot learning with open-set rejection.
Abstract
Keyword spotting is the task of detecting a keyword in streaming audio. Conventional keyword spotting targets predefined keywords classification, but there is growing attention in few-shot (query-by-example) keyword spotting, e.g., N-way classification given M-shot support samples. Moreover, in real-world scenarios, there can be utterances from unexpected categories (open-set) which need to be rejected rather than classified as one of the N classes. Combining the two needs, we tackle few-shot open-set keyword spotting with a new benchmark setting, named splitGSC. We propose episode-known dummy prototypes based on metric learning to detect an open-set better and introduce a simple and powerful approach, Dummy Prototypical Networks (D-ProtoNets). Our D-ProtoNets shows clear margins compared to recent few-shot open-set recognition (FSOSR) approaches in the suggested splitGSC. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
