Coarse-To-Fine And Cross-Lingual ASR Transfer

Peter Pol\'ak; Ond\v{r}ej Bojar

arXiv:2109.00916·cs.CL·September 3, 2021

Coarse-To-Fine And Cross-Lingual ASR Transfer

Peter Pol\'ak, Ond\v{r}ej Bojar

PDF

Open Access

TL;DR

This paper explores transfer learning for end-to-end speech recognition, using an intermediate alphabet and coarse-to-fine training to effectively adapt English models to Czech, reducing training time and error rates.

Contribution

It introduces a novel transfer learning approach with an intermediate alphabet and coarse-to-fine training for cross-lingual ASR adaptation.

Findings

01

Using an intermediate alphabet improves transfer effectiveness.

02

Coarse-to-fine training reduces training time significantly.

03

The method lowers word error rate on Czech ASR.

Abstract

End-to-end neural automatic speech recognition systems achieved recently state-of-the-art results, but they require large datasets and extensive computing resources. Transfer learning has been proposed to overcome these difficulties even across languages, e.g., German ASR trained from an English model. We experiment with much less related languages, reusing an English model for Czech ASR. To simplify the transfer, we propose to use an intermediate alphabet, Czech without accents, and document that it is a highly effective strategy. The technique is also useful on Czech data alone, in the style of coarse-to-fine training. We achieve substantial eductions in training time as well as word error rate (WER).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing