Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data
Chia-Hao Shen, Janet Y. Sung, Hung-Yi Lee

TL;DR
This paper demonstrates that Audio Word2Vec models trained on high-resource source languages can effectively transfer to and represent phonetic structures in low-resource target languages, enabling cross-lingual audio segment analysis.
Contribution
It introduces the concept of language transfer in Audio Word2Vec, showing models trained on one language can be used for another, especially when languages are similar.
Findings
SA captures phonetic structures across languages
Transfer models outperform naive target-specific encoders
High-resource language training benefits low-resource language applications
Abstract
Audio Word2Vec offers vector representations of fixed dimensionality for variable-length audio segments using Sequence-to-sequence Autoencoder (SA). These vector representations are shown to describe the sequential phonetic structures of the audio segments to a good degree, with real world applications such as query-by-example Spoken Term Detection (STD). This paper examines the capability of language transfer of Audio Word2Vec. We train SA from one language (source language) and use it to extract the vector representation of the audio segments of another language (target language). We found that SA can still catch phonetic structure from the audio segments of the target language if the source and target languages are similar. In query-by-example STD, we obtain the vector representations from the SA learned from a large amount of source language data, and found them surpass the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
MethodsSolana Customer Service Number +1-833-534-1729
