SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

TL;DR
This paper introduces SAML, a resource-efficient speaker adaptive MoE approach using LoRA modules, significantly reducing model size while improving ASR performance in speaker-specific scenarios.
Contribution
SAML leverages low-rank adaptation modules as experts in MoE to enable effective speaker adaptation with fewer trainable parameters in compressed ASR models.
Findings
7x reduction in model size achieved
29.1% and 31.1% relative WER reductions on LibriSpeech and TED-LIUM 3
Effective test-time speaker adaptation in compressed models
Abstract
Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing
