Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition
Jocelyn Huang, Oleksii Kuchaiev, Patrick O'Neill, Vitaly Lavrukhin,, Jason Li, Adriana Flores, Georg Kucsko, Boris Ginsburg

TL;DR
This paper explores transfer learning and continuous learning techniques to improve end-to-end automatic speech recognition across different languages, accents, and domains, demonstrating higher accuracy and faster convergence than training from scratch.
Contribution
It shows effective transfer learning methods for multilingual and domain-specific ASR, highlighting the benefits of fine-tuning large pre-trained models over small ones.
Findings
Transfer learning improves accuracy over models trained from scratch.
Fine-tuning large models is more effective than small models.
Transfer learning speeds up convergence for various dataset sizes.
Abstract
In this paper, we demonstrate the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks. We start with a pre-trained English ASR model and show that transfer learning can be effectively and easily performed on: (1) different English accents, (2) different languages (German, Spanish and Russian) and (3) application-specific domains. Our experiments demonstrate that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch. It is preferred to fine-tune large models than small pre-trained models, even if the dataset for fine-tuning is small. Moreover, transfer learning significantly speeds up convergence for both very small and very large target datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
