TL;DR
This study shows that training multilingual acoustic word embedding models on languages related to the target zero-resource language significantly improves performance, especially when using data from even a single related language.
Contribution
It demonstrates the benefit of selecting related languages for transfer learning in acoustic word embeddings, highlighting the importance of language family proximity.
Findings
Training on related languages improves word discrimination.
Even one related language yields large gains.
Adding unrelated languages does not harm performance.
Abstract
Acoustic word embedding models map variable duration speech segments to fixed dimensional vectors, enabling efficient speech search and discovery. Previous work explored how embeddings can be obtained in zero-resource settings where no labelled data is available in the target language. The current best approach uses transfer learning: a single supervised multilingual model is trained using labelled data from multiple well-resourced languages and then applied to a target zero-resource language (without fine-tuning). However, it is still unclear how the specific choice of training languages affect downstream performance. Concretely, here we ask whether it is beneficial to use training languages related to the target. Using data from eleven languages spoken in Southern Africa, we experiment with adding data from different language families while controlling for the amount of data per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
