Feature learning for efficient ASR-free keyword spotting in low-resource languages
Ewald van der Westhuizen, Herman Kamper, Raghav Menon, John Quinn and, Thomas Niesler

TL;DR
This paper presents a low-resource, efficient keyword spotting method using feature learning with neural networks, leveraging multilingual and autoencoder features to improve performance in under-resourced languages for humanitarian applications.
Contribution
It introduces a novel approach combining multilingual bottleneck features and autoencoder-based features to enhance keyword spotting in severely under-resourced languages.
Findings
BNF and CAE features improve performance over MFCCs
BNF and CAE features significantly boost ROC AUC and retrieval rates
Proposed CNN-DTW method is nearly as effective as traditional DTW in low-resource settings
Abstract
We consider feature learning for efficient keyword spotting that can be applied in severely under-resourced settings. The objective is to support humanitarian relief programmes by the United Nations in parts of Africa in which almost no language resources are available. For rapid development in such languages, we rely on a small, easily-compiled set of isolated keywords. These keyword templates are applied to a large corpus of in-domain but untranscribed speech using dynamic time warping (DTW). The resulting DTW alignment scores are used to train a convolutional neural network (CNN) which is orders of magnitude more computationally efficient and suitable for real-time application. We optimise this neural network keyword spotter by identifying robust acoustic features in this almost zero-resource setting. First, we incorporate information from well-resourced but unrelated languages using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDynamic Time Warping
