Acoustic span embeddings for multilingual query-by-example search

Yushi Hu; Shane Settle; and Karen Livescu

arXiv:2011.11807·cs.CL·November 25, 2020·1 cites

Acoustic span embeddings for multilingual query-by-example search

Yushi Hu, Shane Settle, and Karen Livescu

PDF

Open Access 1 Repo

TL;DR

This paper introduces acoustic span embeddings (ASE) for multilingual query-by-example speech search, enabling faster and more accurate search across multiple languages and arbitrary query lengths, especially in low-resource settings.

Contribution

It generalizes acoustic word embeddings to spans of words and demonstrates their effectiveness for multilingual QbE with arbitrary-length queries.

Findings

01

ASE-based search is significantly faster than DTW-based search.

02

ASE outperforms previous state-of-the-art results on QUESST 2015.

03

Multilingual ASE effectively handles low-resource and unseen languages.

Abstract

Query-by-example (QbE) speech search is the task of matching spoken queries to utterances within a search collection. In low- or zero-resource settings, QbE search is often addressed with approaches based on dynamic time warping (DTW). Recent work has found that methods based on acoustic word embeddings (AWEs) can improve both performance and search speed. However, prior work on AWE-based QbE has primarily focused on English data and with single-word queries. In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages. We consider the commonly used setting where we have access to labeled data in other languages (in our case, several low-resource languages) distinct from the unseen test languages. We evaluate our approach on the QUESST 2015 QbE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yushi-Hu/Query-by-Example
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing