Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting
Kevin Wilkinghoff, Alessia Cornaggia-Urrigshardt

TL;DR
This paper introduces a multi-sample dynamic time warping method for few-shot keyword spotting that captures sample variability and balances detection accuracy with computational efficiency.
Contribution
It proposes a novel multi-sample DTW approach that models class variability and reduces inference time compared to naive methods.
Findings
Achieves similar detection performance to using all samples
Runs only slightly slower than using Fréchet means
Outperforms single-sample methods in accuracy
Abstract
In multi-sample keyword spotting, each keyword class is represented by multiple spoken instances, called samples. A na\"ive approach to detect keywords in a target sequence consists of querying all samples of all classes using sub-sequence dynamic time warping. However, the resulting processing time increases linearly with respect to the number of samples belonging to each class. Alternatively, only a single Fr\'echet mean can be queried for each class, resulting in reduced processing time but usually also in worse detection performance as the variability of the query samples is not captured sufficiently well. In this work, multi-sample dynamic time warping is proposed to compute class-specific cost-tensors that include the variability of all query samples. To significantly reduce the computational complexity during inference, these cost tensors are converted to cost matrices before…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Time Series Analysis and Forecasting · Topic Modeling
