Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings
Christiaan Jacobs, Herman Kamper

TL;DR
This paper introduces a novel method for creating semantic acoustic word embeddings using multilingual transfer learning, enabling meaningful speech representations in untranscribed target languages and improving semantic tasks.
Contribution
It proposes a new approach that leverages pre-trained multilingual AWEs and clustering to generate semantic embeddings without transcriptions, outperforming previous methods.
Findings
Outperforms previous semantic AWE methods in word similarity tasks
Enables semantic query-by-example search using AWEs
Demonstrates effectiveness in untranscribed target languages
Abstract
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. These AWEs should not only capture phonetics but also the meaning of a word (similar to textual word embeddings). We consider the scenario where we only have untranscribed speech in a target language. We introduce a number of strategies leveraging a pre-trained multilingual AWE model -- a phonetic AWE model trained on labelled data from multiple languages excluding the target. Our best semantic AWE approach involves clustering word segments using the multilingual AWE model, deriving soft pseudo-word labels from the cluster centroids, and then training a Skipgram-like model on the soft vectors. In an intrinsic word similarity task measuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
