Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer   in ASR

Ondrej Klejch; Electra Wallington; Peter Bell

arXiv:2111.06799·cs.CL·June 7, 2022

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Ondrej Klejch, Electra Wallington, Peter Bell

PDF

Open Access

TL;DR

This paper introduces a zero-resource cross-lingual ASR method that uses a decipherment algorithm on unpaired speech and text data, enabling effective speech recognition without any target language transcriptions or phonetic knowledge.

Contribution

It presents the first practical zero-resource cross-lingual ASR approach that does not depend on hand-crafted phonetic information, using a novel decipherment technique on unpaired data.

Findings

01

Achieved near-supervised WERs with only 20 minutes of target language data.

02

Demonstrated the effectiveness of decipherment on unpaired speech and text data.

03

First practical approach to zero-resource cross-lingual ASR without phonetic knowledge.

Abstract

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by a universal phone recogniser trained on out-of-language speech corpora, which we follow with flat-start semi-supervised training to obtain an acoustic model for the new language. To the best of our knowledge, this is the first practical approach to zero-resource cross-lingual ASR which does not rely on any hand-crafted phonetic information. We carry out experiments on read speech from the GlobalPhone corpus, and show that it is possible to learn a decipherment model on just 20 minutes of data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems