Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based   Approaches

Shane Settle; Karen Livescu

arXiv:1611.02550·cs.CL·November 9, 2016·41 cites

Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

Shane Settle, Karen Livescu

PDF

Open Access

TL;DR

This paper introduces RNN-based discriminative acoustic word embedding models, demonstrating their superiority over previous methods in word discrimination tasks and analyzing factors influencing embedding quality.

Contribution

The paper presents novel RNN-based discriminative embedding models, including classifier and Siamese architectures, for improved acoustic word representations in speech tasks.

Findings

01

Siamese RNN embeddings outperform classification models.

02

Both models improve over previous results on word discrimination.

03

Embedding quality is influenced by dimensionality and network structure.

Abstract

Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search. Such embeddings can be learned discriminatively so that they are similar for speech segments corresponding to the same word, while being dissimilar for segments corresponding to different words. Recent work has found that acoustic word embeddings can outperform dynamic time warping on query-by-example search and related word discrimination tasks. However, the space of embedding models and training approaches is still relatively unexplored. In this paper we present new discriminative embedding models based on recurrent neural networks (RNNs). We consider training losses that have been successful in prior work, in particular a cross entropy loss for word classification and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Time Series Analysis and Forecasting