End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
Kesavaraj V, Anuprabha M, Anil Kumar Vuppala

TL;DR
This paper introduces the use of shifted delta coefficients (SDC) for user-defined keyword spotting, demonstrating improved accuracy over traditional features like MFCC by capturing temporal speech dynamics.
Contribution
The study proposes a novel application of SDC features in an end-to-end system for user-defined keyword spotting, outperforming existing methods on multiple datasets.
Findings
SDC features outperform MFCC in keyword spotting accuracy.
The approach improves AUC by 8.32% and EER by 8.69% on challenging datasets.
The method surpasses state-of-the-art UDKWS techniques.
Abstract
Identifying user-defined keywords is crucial for personalizing interactions with smart devices. Previous approaches of user-defined keyword spotting (UDKWS) have relied on short-term spectral features such as mel frequency cepstral coefficients (MFCC) to detect the spoken keyword. However, these features may face challenges in accurately identifying closely related pronunciation of audio-text pairs, due to their limited capability in capturing the temporal dynamics of the speech signal. To address this challenge, we propose to use shifted delta coefficients (SDC) which help in capturing pronunciation variability (transition between connecting phonemes) by incorporating long-term temporal information. The performance of the SDC feature is compared with various baseline features across four different datasets using a cross-attention based end-to-end system. Additionally, various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior · Web Data Mining and Analysis
