Baby Scale: Investigating Models Trained on Individual Children's Language Input
Steven Y. Feng, Alvin W.M. Tan, Michael C. Frank

TL;DR
This study examines how language models trained on child-specific data from the BabyView dataset perform across various tasks, revealing insights into linguistic development and data quality effects.
Contribution
It introduces an analysis of models trained on child language input, highlighting factors influencing learning efficiency and variability across individual children's data.
Findings
Models trained on child data perform well on grammar tasks.
Semantic and world knowledge tasks show lower scaling performance.
Word likelihoods in models correlate with children's word learning.
Abstract
Modern language models (LMs) must be trained on many orders of magnitude more words of training data than human children receive before they begin to produce useful behavior. Assessing the nature and origins of this "data gap" requires benchmarking LMs on human-scale datasets to understand how linguistic knowledge emerges from children's natural training data. Using transcripts from the BabyView dataset (videos from children ages 6-36 months), we investigate (1) scaling performance at child-scale data regimes, (2) variability in model performance across datasets from different children's experiences and linguistic predictors of dataset quality, and (3) relationships between model and child language learning outcomes. LMs trained on child data show acceptable scaling for grammar tasks, but lower scaling on semantic and world knowledge tasks than models trained on synthetic data; we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
