Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models

Alican Gok; Oguzhan Buyuksolak; Osman Erman Okman; Murat Saraclar

arXiv:2506.17686·eess.AS·October 9, 2025

Enhancing Few-shot Keyword Spotting Performance through Pre-Trained Self-supervised Speech Models

Alican Gok, Oguzhan Buyuksolak, Osman Erman Okman, Murat Saraclar

PDF

TL;DR

This paper introduces a novel training scheme using self-supervised speech models and knowledge distillation to significantly improve few-shot keyword spotting accuracy on edge devices.

Contribution

It proposes a new training approach leveraging Wav2Vec 2.0 and attention-based dimensionality reduction for enhanced FS-KWS performance.

Findings

01

10-shot classification accuracy improved from 33.4% to 74.1% on GSC dataset

02

Enhanced inter-class separability and intra-class compactness with Sub-center ArcFace loss

03

Effective deployment on resource-constrained edge devices

Abstract

Keyword Spotting plays a critical role in enabling hands-free interaction for battery-powered edge devices. Few-Shot Keyword Spotting (FS-KWS) addresses the scalability and adaptability challenges of traditional systems by enabling recognition of custom keywords with only a few examples. However, existing FS-KWS systems achieve subpar accuracy at desirable false acceptance rates, particularly in resource-constrained edge environments. To address these issues, we propose a training scheme that leverages self-supervised learning models for robust feature extraction, dimensionality reduction, and knowledge distillation. The teacher model, based on Wav2Vec 2.0 is trained using Sub-center ArcFace loss, which enhances inter-class separability and intra-class compactness. To enable efficient deployment on edge devices, we introduce attention-based dimensionality reduction and train a standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdditive Angular Margin Loss