Cross-Language Transfer Learning, Continuous Learning, and Domain   Adaptation for End-to-End Automatic Speech Recognition

Jocelyn Huang; Oleksii Kuchaiev; Patrick O'Neill; Vitaly Lavrukhin,; Jason Li; Adriana Flores; Georg Kucsko; Boris Ginsburg

arXiv:2005.04290·eess.AS·May 12, 2020·20 cites

Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition

Jocelyn Huang, Oleksii Kuchaiev, Patrick O'Neill, Vitaly Lavrukhin,, Jason Li, Adriana Flores, Georg Kucsko, Boris Ginsburg

PDF

Open Access

TL;DR

This paper explores transfer learning and continuous learning techniques to improve end-to-end automatic speech recognition across different languages, accents, and domains, demonstrating higher accuracy and faster convergence than training from scratch.

Contribution

It shows effective transfer learning methods for multilingual and domain-specific ASR, highlighting the benefits of fine-tuning large pre-trained models over small ones.

Findings

01

Transfer learning improves accuracy over models trained from scratch.

02

Fine-tuning large models is more effective than small models.

03

Transfer learning speeds up convergence for various dataset sizes.

Abstract

In this paper, we demonstrate the efficacy of transfer learning and continuous learning for various automatic speech recognition (ASR) tasks. We start with a pre-trained English ASR model and show that transfer learning can be effectively and easily performed on: (1) different English accents, (2) different languages (German, Spanish and Russian) and (3) application-specific domains. Our experiments demonstrate that in all three cases, transfer learning from a good base model has higher accuracy than a model trained from scratch. It is preferred to fine-tune large models than small pre-trained models, even if the dataset for fine-tuning is small. Moreover, transfer learning significantly speeds up convergence for both very small and very large target datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing