Fast ASR-free and almost zero-resource keyword spotting using DTW and   CNNs for humanitarian monitoring

Raghav Menon; Herman Kamper; John Quinn; Thomas Niesler

arXiv:1806.09374·cs.CL·June 26, 2018

Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring

Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler

PDF

TL;DR

This paper presents a rapid, resource-efficient method for keyword spotting in under-resourced languages, combining DTW supervision with CNN training to enable fast and effective humanitarian monitoring.

Contribution

It introduces a novel approach that uses DTW scores as supervision for CNN training, reducing the need for extensive labeled data in low-resource language settings.

Findings

01

CNN outperforms keyword-only classifiers (AUC 0.64 vs. 0.54)

02

Method enables fast deployment with minimal labeled data

03

Significantly faster runtime than DTW-based system

Abstract

We use dynamic time warping (DTW) as supervision for training a convolutional neural network (CNN) based keyword spotting system using a small set of spoken isolated keywords. The aim is to allow rapid deployment of a keyword spotting system in a new language to support urgent United Nations (UN) relief programmes in parts of Africa where languages are extremely under-resourced and the development of annotated speech resources is infeasible. First, we use 1920 recorded keywords (40 keyword types, 34 minutes of speech) as exemplars in a DTW-based template matching system and apply it to untranscribed broadcast speech. Then, we use the resulting DTW scores as targets to train a CNN on the same unlabelled speech. In this way we use just 34 minutes of labelled speech, but leverage a large amount of unlabelled data for training. While the resulting CNN keyword spotter cannot match the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDynamic Time Warping