Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches
Shane Settle, Karen Livescu

TL;DR
This paper introduces RNN-based discriminative acoustic word embedding models, demonstrating their superiority over previous methods in word discrimination tasks and analyzing factors influencing embedding quality.
Contribution
The paper presents novel RNN-based discriminative embedding models, including classifier and Siamese architectures, for improved acoustic word representations in speech tasks.
Findings
Siamese RNN embeddings outperform classification models.
Both models improve over previous results on word discrimination.
Embedding quality is influenced by dimensionality and network structure.
Abstract
Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search. Such embeddings can be learned discriminatively so that they are similar for speech segments corresponding to the same word, while being dissimilar for segments corresponding to different words. Recent work has found that acoustic word embeddings can outperform dynamic time warping on query-by-example search and related word discrimination tasks. However, the space of embedding models and training approaches is still relatively unexplored. In this paper we present new discriminative embedding models based on recurrent neural networks (RNNs). We consider training losses that have been successful in prior work, in particular a cross entropy loss for word classification and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Time Series Analysis and Forecasting
