Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR
Zilai Wang, Natarajan Balaji Shankar, Kaiyuan Zhang, Zihan Wang, Abeer Alwan

TL;DR
This paper introduces delta SSL embeddings, which are differences between pretrained and finetuned speech model embeddings, to improve child speech recognition accuracy, achieving state-of-the-art results on the MyST corpus.
Contribution
It proposes a novel delta embedding fusion method that enhances child ASR performance by combining features from different SSL models, outperforming existing approaches.
Findings
Delta embedding fusion reduces WER by up to 10% relative.
Fusing WavLM with delta W2V2 achieves a WER of 9.64.
Delta embeddings improve model adaptation to child speech.
Abstract
Self-supervised learning (SSL) models have achieved impressive results across many speech tasks, yet child automatic speech recognition (ASR) remains challenging due to limited data and pretraining domain mismatch. Fine-tuning SSL models on child speech induces shifts in the representation space. We hypothesize that delta SSL embeddings, defined as the differences between embeddings from a finetuned model and those from its pretrained counterpart, encode task-specific information that complements finetuned features from another SSL model. We evaluate multiple fusion strategies on the MyST childrens corpus using different models. Results show that delta embedding fusion with WavLM yields up to a 10 percent relative WER reduction for HuBERT and a 4.4 percent reduction for W2V2, compared to finetuned embedding fusion. Notably, fusing WavLM with delta W2V2 embeddings achieves a WER of 9.64,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Language Development and Disorders · Emotion and Mood Recognition
