Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

Gus Lathouwers; Lingyun Gao; Catia Cucchiarini; Helmer Strik

arXiv:2604.19801·eess.AS·April 23, 2026

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

Gus Lathouwers, Lingyun Gao, Catia Cucchiarini, Helmer Strik

PDF

TL;DR

This paper introduces two novel utterance-level methods to identify reliable ASR outputs in child speech, significantly improving the selection precision and reducing error rates across English and Dutch datasets.

Contribution

The work presents new approaches for selecting reliable child speech transcriptions at the utterance level, enhancing ASR application effectiveness.

Findings

01

High precision (over 97.4%) in selecting reliable utterances.

02

Automatic selection of 21.0% to 55.9% of data with less than 2.6% error rate.

03

Effective across both read and dialogue speech in two languages.

Abstract

Automatic Speech Recognition (ASR) is increasingly used in applications involving child speech, such as language learning and literacy acquisition. However, the effectiveness of such applications is limited by high ASR error rates. The negative effects can be mitigated by identifying in advance which ASR-outputs are reliable. This work aims to develop two novel approaches for selecting reliable ASR-output at the utterance level, one for selecting reliable read speech and one for dialogue speech material. Evaluations were done on an English and a Dutch dataset, each with a baseline and finetuned model. The results show that utterance-level selection methods for identifying reliably transcribed speech recordings have high precision for the best strategy (P > 97.4) for both read speech and dialogue material, for both languages. Using the current optimal strategy allows 21.0% to 55.9% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.