Multilingual acoustic word embeddings for zero-resource languages

Christiaan Jacobs

arXiv:2401.10543·eess.AS·January 24, 2024·1 cites

Multilingual acoustic word embeddings for zero-resource languages

Christiaan Jacobs

PDF

Open Access

TL;DR

This paper presents a neural network-based multilingual acoustic word embedding approach that enhances zero-resource language speech applications, demonstrating improved keyword spotting and semantic search in real-world scenarios.

Contribution

It introduces a new neural network model for multilingual AWEs that outperforms existing models and explores the impact of language choices on zero-resource language tasks.

Findings

01

Outperforms existing AWE models on zero-resource languages

02

Demonstrates robustness in hate speech keyword spotting in Swahili broadcasts

03

Improves semantic query-by-example search with novel models

Abstract

This research addresses the challenge of developing speech applications for zero-resource languages that lack labelled data. It specifically uses acoustic word embedding (AWE) -- fixed-dimensional representations of variable-duration speech segments -- employing multilingual transfer, where labelled data from several well-resourced languages are used for pertaining. The study introduces a new neural network that outperforms existing AWE models on zero-resource languages. It explores the impact of the choice of well-resourced languages. AWEs are applied to a keyword-spotting system for hate speech detection in Swahili radio broadcasts, demonstrating robustness in real-world scenarios. Additionally, novel semantic AWE models improve semantic query-by-example search.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing