Sparsely Shared LoRA on Whisper for Child Speech Recognition
Wei Liu, Ying Qin, Zhiyuan Peng, Tan Lee

TL;DR
This paper introduces S2-LoRA, a sparse sharing method for PEFT on Whisper, improving low-resource child speech recognition with fewer parameters and better generalization, inspired by AdaLoRA's rank distribution.
Contribution
The paper proposes S2-LoRA, a novel PEFT approach that shares low-rank matrices sparsely, enhancing adaptation efficiency and out-of-domain performance for Whisper on child speech.
Findings
S2-LoRA achieves comparable in-domain performance to AdaLoRA with fewer parameters.
S2-LoRA exhibits better out-of-domain generalization.
Learned rank distribution in S2-LoRA resembles AdaLoRA's pattern.
Abstract
Whisper is a powerful automatic speech recognition (ASR) model. Nevertheless, its zero-shot performance on low-resource speech requires further improvement. Child speech, as a representative type of low-resource speech, is leveraged for adaptation. Recently, parameter-efficient fine-tuning (PEFT) in NLP was shown to be comparable and even better than full fine-tuning, while only needing to tune a small set of trainable parameters. However, current PEFT methods have not been well examined for their effectiveness on Whisper. In this paper, only parameter composition types of PEFT approaches such as LoRA and Bitfit are investigated as they do not bring extra inference costs. Different popular PEFT methods are examined. Particularly, we compare LoRA and AdaLoRA and figure out the learnable rank coefficient is a good design. Inspired by the sparse rank distribution allocated by AdaLoRA, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
