Leveraging multilingual transfer for unsupervised semantic acoustic word   embeddings

Christiaan Jacobs; Herman Kamper

arXiv:2307.02083·eess.AS·July 6, 2023·1 cites

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

Christiaan Jacobs, Herman Kamper

PDF

Open Access

TL;DR

This paper introduces a novel method for creating semantic acoustic word embeddings using multilingual transfer learning, enabling meaningful speech representations in untranscribed target languages and improving semantic tasks.

Contribution

It proposes a new approach that leverages pre-trained multilingual AWEs and clustering to generate semantic embeddings without transcriptions, outperforming previous methods.

Findings

01

Outperforms previous semantic AWE methods in word similarity tasks

02

Enables semantic query-by-example search using AWEs

03

Demonstrates effectiveness in untranscribed target languages

Abstract

Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. These AWEs should not only capture phonetics but also the meaning of a word (similar to textual word embeddings). We consider the scenario where we only have untranscribed speech in a target language. We introduce a number of strategies leveraging a pre-trained multilingual AWE model -- a phonetic AWE model trained on labelled data from multiple languages excluding the target. Our best semantic AWE approach involves clustering word segments using the multilingual AWE model, deriving soft pseudo-word labels from the cluster centroids, and then training a Skipgram-like model on the soft vectors. In an intrinsic word similarity task measuring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling