Sparsely Shared LoRA on Whisper for Child Speech Recognition

Wei Liu; Ying Qin; Zhiyuan Peng; Tan Lee

arXiv:2309.11756·eess.AS·January 9, 2024·1 cites

Sparsely Shared LoRA on Whisper for Child Speech Recognition

Wei Liu, Ying Qin, Zhiyuan Peng, Tan Lee

PDF

Open Access

TL;DR

This paper introduces S2-LoRA, a sparse sharing method for PEFT on Whisper, improving low-resource child speech recognition with fewer parameters and better generalization, inspired by AdaLoRA's rank distribution.

Contribution

The paper proposes S2-LoRA, a novel PEFT approach that shares low-rank matrices sparsely, enhancing adaptation efficiency and out-of-domain performance for Whisper on child speech.

Findings

01

S2-LoRA achieves comparable in-domain performance to AdaLoRA with fewer parameters.

02

S2-LoRA exhibits better out-of-domain generalization.

03

Learned rank distribution in S2-LoRA resembles AdaLoRA's pattern.

Abstract

Whisper is a powerful automatic speech recognition (ASR) model. Nevertheless, its zero-shot performance on low-resource speech requires further improvement. Child speech, as a representative type of low-resource speech, is leveraged for adaptation. Recently, parameter-efficient fine-tuning (PEFT) in NLP was shown to be comparable and even better than full fine-tuning, while only needing to tune a small set of trainable parameters. However, current PEFT methods have not been well examined for their effectiveness on Whisper. In this paper, only parameter composition types of PEFT approaches such as LoRA and Bitfit are investigated as they do not bring extra inference costs. Different popular PEFT methods are examined. Particularly, we compare LoRA and AdaLoRA and figure out the learnable rank coefficient is a good design. Inspired by the sparse rank distribution allocated by AdaLoRA, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing