EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting

Oguzhan Buyuksolak; Alican Gok; Osman Erman Okman

arXiv:2601.16316·eess.AS·January 26, 2026

EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting

Oguzhan Buyuksolak, Alican Gok, Osman Erman Okman

PDF

Open Access

TL;DR

EdgeSpot is a novel, efficient few-shot keyword spotting model optimized for edge devices, combining a lightweight backbone, self-supervised training, and knowledge distillation to outperform existing baselines in accuracy and computational cost.

Contribution

The paper introduces EdgeSpot, a new lightweight few-shot keyword spotting model that achieves higher accuracy and efficiency on edge devices using innovative training and architectural techniques.

Findings

01

EdgeSpot-4 improves 10-shot accuracy from 73.7% to 82.0% at 1% FAR.

02

The model requires only 29.4M MACs and 128k parameters.

03

EdgeSpot outperforms strong BC-ResNet baselines in accuracy at fixed FAR.

Abstract

We introduce an efficient few-shot keyword spotting model for edge devices, EdgeSpot, that pairs an optimized version of a BC-ResNet-based acoustic backbone with a trainable Per-Channel Energy Normalization frontend and lightweight temporal self-attention. Knowledge distillation is utilized during training by employing a self-supervised teacher model, optimized with Sub-center ArcFace loss. This study demonstrates that the EdgeSpot model consistently provides better accuracy at a fixed false-alarm rate (FAR) than strong BC-ResNet baselines. The largest variant, EdgeSpot-4, improves the 10-shot accuracy at 1% FAR from 73.7% to 82.0%, which requires only 29.4M MACs with 128k parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling