Feature learning for efficient ASR-free keyword spotting in low-resource   languages

Ewald van der Westhuizen; Herman Kamper; Raghav Menon; John Quinn and; Thomas Niesler

arXiv:2108.06174·eess.AS·August 16, 2021·Comput. Speech Lang.

Feature learning for efficient ASR-free keyword spotting in low-resource languages

Ewald van der Westhuizen, Herman Kamper, Raghav Menon, John Quinn and, Thomas Niesler

PDF

TL;DR

This paper presents a low-resource, efficient keyword spotting method using feature learning with neural networks, leveraging multilingual and autoencoder features to improve performance in under-resourced languages for humanitarian applications.

Contribution

It introduces a novel approach combining multilingual bottleneck features and autoencoder-based features to enhance keyword spotting in severely under-resourced languages.

Findings

01

BNF and CAE features improve performance over MFCCs

02

BNF and CAE features significantly boost ROC AUC and retrieval rates

03

Proposed CNN-DTW method is nearly as effective as traditional DTW in low-resource settings

Abstract

We consider feature learning for efficient keyword spotting that can be applied in severely under-resourced settings. The objective is to support humanitarian relief programmes by the United Nations in parts of Africa in which almost no language resources are available. For rapid development in such languages, we rely on a small, easily-compiled set of isolated keywords. These keyword templates are applied to a large corpus of in-domain but untranscribed speech using dynamic time warping (DTW). The resulting DTW alignment scores are used to train a convolutional neural network (CNN) which is orders of magnitude more computationally efficient and suitable for real-time application. We optimise this neural network keyword spotter by identifying robust acoustic features in this almost zero-resource setting. First, we incorporate information from well-resourced but unrelated languages using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDynamic Time Warping