Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on   Whisper

Tianyi Xu; Kaixun Huang; Pengcheng Guo; Yu Zhou; Longtao Huang; Hui; Xue; Lei Xie

arXiv:2408.10680·cs.CL·August 21, 2024

Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper

Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui, Xue, Lei Xie

PDF

Open Access

TL;DR

This paper explores LoRA-based strategies to adapt multilingual speech models like Whisper to new languages without retraining from scratch, aiming to prevent forgetting and improve efficiency.

Contribution

It introduces a novel approach using original LoRA parameters and a learnable rank coefficient to enhance multilingual model adaptation while preserving previous performance.

Findings

01

Better adaptation results with fewer parameters

02

Effective mitigation of catastrophic forgetting

03

Improved efficiency in multilingual speech recognition

Abstract

Pre-trained multilingual speech foundation models, like Whisper, have shown impressive performance across different languages. However, adapting these models to new or specific languages is computationally extensive and faces catastrophic forgetting problems. Addressing these issues, our study investigates strategies to enhance the model on new languages in the absence of original training data, while also preserving the established performance on the original languages. Specifically, we first compare various LoRA-based methods to find out their vulnerability to forgetting. To mitigate this issue, we propose to leverage the LoRA parameters from the original model for approximate orthogonal gradient descent on the new samples. Additionally, we also introduce a learnable rank coefficient to allocate trainable parameters for more efficient training. Our experiments with a Chinese Whisper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection