Neural Network based End-to-End Query by Example Spoken Term Detection

Dhananjay Ram; Lesly Miculicich; Herv\'e Bourlard

arXiv:1911.08332·eess.AS·November 20, 2019

Neural Network based End-to-End Query by Example Spoken Term Detection

Dhananjay Ram, Lesly Miculicich, Herv\'e Bourlard

PDF

TL;DR

This paper introduces a neural network end-to-end framework for query by example spoken term detection, outperforming traditional DTW-based methods by jointly optimizing feature extraction and pattern matching.

Contribution

It presents the first fully neural network-based end-to-end system for QbE-STD, replacing separate feature extraction and matching stages with joint optimization.

Findings

01

Multilingual bottleneck features improve with more training languages.

02

CNN-based matching outperforms DTW-based matching with bottleneck features.

03

End-to-end training significantly improves detection performance.

Abstract

This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other. We propose to integrate these two stages in a fully neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDynamic Time Warping