Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao, Guangzhi Sun, Chao Zhang

TL;DR
This paper introduces LoRS-Merging, a low-rank and sparse model merging technique that enhances multi-lingual speech recognition and translation by reducing computational costs and mitigating language interference, outperforming traditional methods.
Contribution
The paper proposes LoRS-Merging, a novel model merging approach that efficiently combines models trained on different languages, improving performance and scalability in multi-lingual S2T tasks.
Findings
LoRS-Merging outperforms multi-task training and other merging methods by over 20% in normalized performance.
Experimental validation across 10 languages demonstrates significant improvements.
LoRS-Merging reduces computational overhead and mitigates language interference.
Abstract
Language diversity presents a significant challenge in speech-to-text (S2T) tasks, such as automatic speech recognition and translation. Traditional multi-lingual multi-task training approaches aim to address this by jointly optimising multiple speech recognition and translation tasks across various languages. While models like Whisper, built on these strategies, demonstrate strong performance, they still face issues of high computational cost, language interference, suboptimal training configurations, and limited extensibility. To overcome these challenges, we introduce LoRS-Merging (low-rank and sparse model merging), a novel technique designed to efficiently integrate models trained on different languages or tasks while preserving performance and reducing computational overhead. LoRS-Merging combines low-rank and sparse pruning to retain essential structures while eliminating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsPruning
