Employing self-supervised learning models for cross-linguistic child speech maturity classification

Theo Zhang; Madurya Suresh; Anne S. Warlaumont; Kasia Hitczenko; Alejandrina Cristia; Margaret Cychosz

arXiv:2506.08999·cs.CL·June 11, 2025

Employing self-supervised learning models for cross-linguistic child speech maturity classification

Theo Zhang, Madurya Suresh, Anne S. Warlaumont, Kasia Hitczenko, Alejandrina Cristia, Margaret Cychosz

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large, diverse dataset of child vocalizations and demonstrates that self-supervised transformer models trained on it can accurately classify different types of child speech, outperforming previous models and approaching human-level accuracy.

Contribution

The study presents the SpeechMaturity dataset with over 242,000 vocalizations across 25+ languages, and shows that self-supervised models trained on this data significantly improve child speech classification.

Findings

01

Models outperform previous state-of-the-art classifiers.

02

Achieved accuracy comparable to human judgments.

03

Robust performance across different environments.

Abstract

Speech technology systems struggle with many downstream tasks for child speech due to small training corpora and the difficulties that child speech pose. We apply a novel dataset, SpeechMaturity, to state-of-the-art transformer models to address a fundamental classification task: identifying child vocalizations. Unlike previous corpora, our dataset captures maximally ecologically-valid child vocalizations across an unprecedented sample, comprising children acquiring 25+ languages in the U.S., Bolivia, Vanuatu, Papua New Guinea, Solomon Islands, and France. The dataset contains 242,004 labeled vocalizations, magnitudes larger than previous work. Models were trained to distinguish between cry, laughter, mature (consonant+vowel), and immature speech (just consonant or vowel). Models trained on the dataset outperform state-of-the-art models trained on previous datasets, achieved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spoglab-stanford/w2v2-pro-sm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfant Health and Development · Speech Recognition and Synthesis · Language Development and Disorders