Towards scalable efficient on-device ASR with transfer learning

Laxmi Pandey; Ke Li; Jinxi Guo; Debjyoti Paul; Arthur Guo; Jay; Mahadeokar; Xuedong Zhang

arXiv:2407.16664·cs.CL·July 24, 2024

Towards scalable efficient on-device ASR with transfer learning

Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay, Mahadeokar, Xuedong Zhang

PDF

TL;DR

This paper demonstrates that transfer learning with multilingual pretraining and RNNT-loss significantly improves on-device ASR performance, especially for low-resource languages and rare words, achieving substantial WER reductions.

Contribution

It systematically investigates transfer learning effects on multilingual ASR, highlighting the benefits of RNNT-loss pretraining and domain adaptation for low-resource scenarios.

Findings

01

RNNT-loss pretraining followed by MinWER fine-tuning reduces WER.

02

Out-of-domain pretraining yields higher WER improvements than in-domain.

03

Both rare and non-rare words benefit, with rare words gaining more from out-of-domain pretraining.

Abstract

Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition compared to non-rare words. Our finding suggests that RNNT-loss pretraining, followed by monolingual fine-tuning with Minimum Word Error Rate (MinWER) loss, consistently reduces Word Error Rates (WER) across languages like Italian and French. WER Reductions (WERR) reach 36.2% and 42.8% compared to monolingual baselines for MLS and in-house datasets. Out-of-domain pretraining leads to 28% higher WERR than in-domain pretraining. Both rare and non-rare words benefit, with rare words showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.