Language Transfer of Audio Word2Vec: Learning Audio Segment   Representations without Target Language Data

Chia-Hao Shen; Janet Y. Sung; Hung-Yi Lee

arXiv:1707.06519·cs.CL·February 20, 2018·1 cites

Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data

Chia-Hao Shen, Janet Y. Sung, Hung-Yi Lee

PDF

Open Access

TL;DR

This paper demonstrates that Audio Word2Vec models trained on high-resource source languages can effectively transfer to and represent phonetic structures in low-resource target languages, enabling cross-lingual audio segment analysis.

Contribution

It introduces the concept of language transfer in Audio Word2Vec, showing models trained on one language can be used for another, especially when languages are similar.

Findings

01

SA captures phonetic structures across languages

02

Transfer models outperform naive target-specific encoders

03

High-resource language training benefits low-resource language applications

Abstract

Audio Word2Vec offers vector representations of fixed dimensionality for variable-length audio segments using Sequence-to-sequence Autoencoder (SA). These vector representations are shown to describe the sequential phonetic structures of the audio segments to a good degree, with real world applications such as query-by-example Spoken Term Detection (STD). This paper examines the capability of language transfer of Audio Word2Vec. We train SA from one language (source language) and use it to extract the vector representation of the audio segments of another language (target language). We found that SA can still catch phonetic structure from the audio segments of the target language if the source and target languages are similar. In query-by-example STD, we obtain the vector representations from the SA learned from a large amount of source language data, and found them surpass the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques

MethodsSolana Customer Service Number +1-833-534-1729