Gated Low-rank Adaptation for personalized Code-Switching Automatic   Speech Recognition on the low-spec devices

Gwantae Kim; Bokyeung Lee; Donghyeon Kim; Hanseok Ko

arXiv:2406.02562·eess.AS·June 6, 2024

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

PDF

Open Access

TL;DR

This paper introduces GLoRA, a gated low-rank adaptation method for personalized, efficient code-switching speech recognition on low-spec devices, demonstrating superior performance on Korean-English datasets.

Contribution

It proposes a novel GLoRA method for parameter-efficient fine-tuning and a weights separation approach for on-device models, specifically addressing code-switching recognition.

Findings

01

GLoRA outperforms traditional LoRA in fine-tuning performance.

02

Fine-tuned models surpass models trained from scratch in code-switching recognition.

03

The approach enables efficient on-device personalized speech recognition.

Abstract

In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis