Employing self-supervised learning models for cross-linguistic child speech maturity classification
Theo Zhang, Madurya Suresh, Anne S. Warlaumont, Kasia Hitczenko, Alejandrina Cristia, Margaret Cychosz

TL;DR
This paper introduces a large, diverse dataset of child vocalizations and demonstrates that self-supervised transformer models trained on it can accurately classify different types of child speech, outperforming previous models and approaching human-level accuracy.
Contribution
The study presents the SpeechMaturity dataset with over 242,000 vocalizations across 25+ languages, and shows that self-supervised models trained on this data significantly improve child speech classification.
Findings
Models outperform previous state-of-the-art classifiers.
Achieved accuracy comparable to human judgments.
Robust performance across different environments.
Abstract
Speech technology systems struggle with many downstream tasks for child speech due to small training corpora and the difficulties that child speech pose. We apply a novel dataset, SpeechMaturity, to state-of-the-art transformer models to address a fundamental classification task: identifying child vocalizations. Unlike previous corpora, our dataset captures maximally ecologically-valid child vocalizations across an unprecedented sample, comprising children acquiring 25+ languages in the U.S., Bolivia, Vanuatu, Papua New Guinea, Solomon Islands, and France. The dataset contains 242,004 labeled vocalizations, magnitudes larger than previous work. Models were trained to distinguish between cry, laughter, mature (consonant+vowel), and immature speech (just consonant or vowel). Models trained on the dataset outperform state-of-the-art models trained on previous datasets, achieved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfant Health and Development · Speech Recognition and Synthesis · Language Development and Disorders
