Enabling automatic transcription of child-centered audio recordings from real-world environments

Daniil Kocharov; Okko R\"as\"anen

arXiv:2506.11747·cs.SD·June 16, 2025

Enabling automatic transcription of child-centered audio recordings from real-world environments

Daniil Kocharov, Okko R\"as\"anen

PDF

Open Access

TL;DR

This paper introduces a method to automatically identify and transcribe reliably speech segments in noisy, longform child-centered audio recordings, enabling scalable linguistic analysis with high accuracy on selected speech portions.

Contribution

It presents a novel approach to detect transcribable speech segments in longform audio, significantly improving transcription quality and enabling detailed linguistic analysis of child-centered recordings.

Findings

01

Median WER of 0% on selected segments

02

Transcription of 13% of speech with 18% mean WER

03

High correlation (r=0.92) between automatic and manual word frequencies

Abstract

Longform audio recordings obtained with microphones worn by children-also known as child-centered daylong recordings-have become a standard method for studying children's language experiences and their impact on subsequent language development. Transcripts of longform speech audio would enable rich analyses at various linguistic levels, yet the massive scale of typical longform corpora prohibits comprehensive manual annotation. At the same time, automatic speech recognition (ASR)-based transcription faces significant challenges due to the noisy, unconstrained nature of real-world audio, and no existing study has successfully applied ASR to transcribe such data. However, previous attempts have assumed that ASR must process each longform recording in its entirety. In this work, we present an approach to automatically detect those utterances in longform audio that can be reliably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies