TL;DR
This paper introduces multilingual transfer methods to improve acoustic word embeddings for zero-resource languages, enabling better speech search and discovery without labeled data.
Contribution
It compares three multilingual RNN models trained on multiple languages, demonstrating significant improvements over unsupervised methods in zero-resource scenarios.
Findings
All models outperform state-of-the-art unsupervised models by over 30% in average precision.
The CAE model encodes more phonetic and speaker information than other models.
More training languages generally improve embedding quality, with diminishing returns.
Abstract
Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
