Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

TL;DR
This paper introduces GLoRA, a gated low-rank adaptation method for personalized, efficient code-switching speech recognition on low-spec devices, demonstrating superior performance on Korean-English datasets.
Contribution
It proposes a novel GLoRA method for parameter-efficient fine-tuning and a weights separation approach for on-device models, specifically addressing code-switching recognition.
Findings
GLoRA outperforms traditional LoRA in fine-tuning performance.
Fine-tuned models surpass models trained from scratch in code-switching recognition.
The approach enables efficient on-device personalized speech recognition.
Abstract
In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
