Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity

Loann Peurey; Marvin Lavechin; Tarek Kunze; Manel Khentout; Lucas Gautheron; Emmanuel Dupoux; Alejandrina Cristia

arXiv:2506.11075·eess.AS·September 3, 2025

Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity

Loann Peurey, Marvin Lavechin, Tarek Kunze, Manel Khentout, Lucas Gautheron, Emmanuel Dupoux, Alejandrina Cristia

PDF

TL;DR

This paper reviews the use of child-worn long-form audio recordings in language research, discussing their benefits, challenges, and strategies for improving data quality and analysis accuracy.

Contribution

It provides a comprehensive overview of long-form child audio recordings, highlights sources of error, and offers practical troubleshooting strategies for researchers.

Findings

01

Long-form recordings offer high validity with minimal observer bias.

02

Automated analysis faces challenges due to annotation errors.

03

Practical strategies can improve data quality and interpretation.

Abstract

Audio-recordings collected with a child-worn device are a fundamental tool in child language research. Long-form recordings collected over whole days promise to capture children's input and production with minimal observer bias, and therefore high validity. The sheer volume of resulting data necessitates automated analysis to extract relevant metrics for researchers and clinicians. This paper summarizes collective knowledge on this technique, providing entry points to existing resources. We also highlight various sources of error that threaten the accuracy of automated annotations and the interpretation of resulting metrics. To address this, we propose potential troubleshooting metrics to help users assess data quality. While a fully automated quality control system is not feasible, we outline practical strategies for researchers to improve data collection and contextualize their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.