Coarse-To-Fine And Cross-Lingual ASR Transfer
Peter Pol\'ak, Ond\v{r}ej Bojar

TL;DR
This paper explores transfer learning for end-to-end speech recognition, using an intermediate alphabet and coarse-to-fine training to effectively adapt English models to Czech, reducing training time and error rates.
Contribution
It introduces a novel transfer learning approach with an intermediate alphabet and coarse-to-fine training for cross-lingual ASR adaptation.
Findings
Using an intermediate alphabet improves transfer effectiveness.
Coarse-to-fine training reduces training time significantly.
The method lowers word error rate on Czech ASR.
Abstract
End-to-end neural automatic speech recognition systems achieved recently state-of-the-art results, but they require large datasets and extensive computing resources. Transfer learning has been proposed to overcome these difficulties even across languages, e.g., German ASR trained from an English model. We experiment with much less related languages, reusing an English model for Czech ASR. To simplify the transfer, we propose to use an intermediate alphabet, Czech without accents, and document that it is a highly effective strategy. The technique is also useful on Czech data alone, in the style of coarse-to-fine training. We achieve substantial eductions in training time as well as word error rate (WER).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
