From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar,, Tara N. Sainath, Trevor Strohman

TL;DR
This paper introduces a parameter-efficient neural reprogramming framework that adapts English ASR models for multilingual speech recognition, achieving competitive results with significantly fewer trainable parameters.
Contribution
It presents a novel reprogramming approach with auxiliary architectures for cross-lingual ASR, reducing training costs and outperforming existing tuning methods.
Findings
Achieves 8.1%-11.9% WER on multilingual LibriSpeech with only 4.2%-6.8% of parameters trained.
Outperforms existing ASR tuning architectures and self-supervised extension methods.
Enables effective monolingual and multilingual speech recognition with large-scale pre-trained models.
Abstract
In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages. We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement that, for the first time, empowers model reprogramming on ASR. Specifically, we investigate how to select trainable components (i.e., encoder) of a conformer-based RNN-Transducer, as a frozen pre-trained backbone. Experiments on a seven-language multilingual LibriSpeech speech (MLS) task show that model reprogramming only requires 4.2% (11M out of 270M) to 6.8% (45M out of 660M) of its original trainable parameters from a full ASR model to perform competitive results in a range of 11.9% to 8.1% WER averaged across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
