SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Qiuming Zhao; Guangzhi Sun; Chao Zhang; Mingxing Xu; Thomas Fang Zheng

arXiv:2406.19706·cs.SD·July 1, 2024

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces SAML, a resource-efficient speaker adaptive MoE approach using LoRA modules, significantly reducing model size while improving ASR performance in speaker-specific scenarios.

Contribution

SAML leverages low-rank adaptation modules as experts in MoE to enable effective speaker adaptation with fewer trainable parameters in compressed ASR models.

Findings

01

7x reduction in model size achieved

02

29.1% and 31.1% relative WER reductions on LibriSpeech and TED-LIUM 3

03

Effective test-time speaker adaptation in compressed models

Abstract

Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qmgzhao/SAML
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing