Selective Attention Merging for low resource tasks: A case study of Child ASR
Natarajan Balaji Shankar, Zilai Wang, Eray Eren, Abeer Alwan

TL;DR
This paper introduces Selective Attention Merge, a novel model merging technique that improves low-resource child ASR performance by leveraging larger speech models, achieving significant WER reductions and state-of-the-art results.
Contribution
The paper proposes a new Selective Attention Merge method that selectively combines attention matrices to enhance low-resource speech recognition tasks.
Findings
Up to 14% relative WER reduction on MyST database
State-of-the-art WER of 8.69 achieved with SA Merge
Effective combination of data augmentation and model merging
Abstract
While Speech Foundation Models (SFMs) excel in various speech tasks, their performance for low-resource tasks such as child Automatic Speech Recognition (ASR) is hampered by limited pretraining data. To address this, we explore different model merging techniques to leverage knowledge from models trained on larger, more diverse speech corpora. This paper also introduces Selective Attention (SA) Merge, a novel method that selectively merges task vectors from attention matrices to enhance SFM performance on low-resource tasks. Experiments on the MyST database show significant reductions in relative word error rate of up to 14%, outperforming existing model merging and data augmentation techniques. By combining data augmentation techniques with SA Merge, we achieve a new state-of-the-art WER of 8.69 on the MyST database for the Whisper-small model, highlighting the potential of SA Merge for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
MethodsSoftmax · Attention Is All You Need
