Group-Aware Partial Model Merging for Children's Automatic Speech Recognition
Thomas Rolland, Alberto Abad

TL;DR
This paper presents GRAPAM, a novel group-aware partial model merging technique that improves children's speech recognition by clustering data, partially fine-tuning models, and merging them, resulting in better accuracy with fewer parameters.
Contribution
It introduces a group-aware partial model merging method that enhances children's ASR by combining clustering, partial fine-tuning, and model merging, outperforming full fine-tuning.
Findings
Achieves 6% relative WER reduction on MyST corpus.
Outperforms full fine-tuning with fewer parameters.
Effective in capturing group-specific characteristics.
Abstract
While supervised fine-tuning of adult pre-trained models for children's ASR has shown promise, it often fails to capture group-specific characteristics and variations among children. To address this, we introduce GRoup-Aware PARtial model Merging, a parameter-efficient approach that combines unsupervised clustering, partial fine-tuning, and model merging. Our approach adapts adult-pre-trained models to children by first grouping the children's data based on acoustic similarity. Each group is used to partially fine-tune an adult pre-trained model, and the resulting models are merged at the parameter level. Experiments conducted on the MyST children's speech corpus indicate that GRAPAM achieves a relative WER improvement of 6%, using the same amount of data, outperforming full fine-tuning while training fewer parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Face recognition and analysis
