Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
Christiaan Jacobs, Annelien Smith, Daleen Klop, Ond\v{r}ej Klejch,, Febe de Wet, Herman Kamper

TL;DR
This paper develops and evaluates automatic speech recognition systems for preschool Afrikaans and isiXhosa children’s narratives, demonstrating that domain-specific adult data and semi-supervised learning significantly improve recognition accuracy in these low-resource, under-studied languages.
Contribution
It introduces the first ASR systems for preschool Afrikaans and isiXhosa child speech, comparing various strategies and highlighting the effectiveness of domain data and semi-supervised learning.
Findings
Adult domain data improves ASR performance.
Semi-supervised learning benefits both languages.
Parameter-efficient fine-tuning helps Afrikaans but not isiXhosa.
Abstract
We develop automatic speech recognition (ASR) systems for stories told by Afrikaans and isiXhosa preschool children. Oral narratives provide a way to assess children's language development before they learn to read. We consider a range of prior child-speech ASR strategies to determine which is best suited to this unique setting. Using Whisper and only 5 minutes of transcribed in-domain child speech, we find that additional in-domain adult data (adult speech matching the story domain) provides the biggest improvement, especially when coupled with voice conversion. Semi-supervised learning also helps for both languages, while parameter-efficient fine-tuning helps on Afrikaans but not on isiXhosa (which is under-represented in the Whisper model). Few child-speech studies look at non-English data, and even fewer at the preschool ages of 4 and 5. Our work therefore represents a unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
