On the robustness of modeling grounded word learning through a child's egocentric input

Wai Keen Vong; Brenden M. Lake

arXiv:2507.14749·cs.CL·January 9, 2026

On the robustness of modeling grounded word learning through a child's egocentric input

Wai Keen Vong, Brenden M. Lake

PDF

Open Access

TL;DR

This study investigates whether multimodal neural networks trained on automatically transcribed, child-specific video data can reliably learn word-referent mappings, demonstrating robustness across different children and domains in language acquisition modeling.

Contribution

It extends prior work by applying automated transcription to a large, multi-child dataset, testing the robustness and individual differences in grounded word learning.

Findings

01

Networks trained on each child's data can learn word-referent mappings.

02

Models generalize across different children and image domains.

03

Individual differences influence how models acquire language from developmental experiences.

Abstract

What insights can machine learning bring to understanding human language acquisition? Large language and multimodal models have achieved remarkable capabilities, but their reliance on massive training datasets creates a fundamental mismatch with children, who succeed in acquiring language from comparatively limited input. To help bridge this gap, researchers have increasingly trained neural networks using data similar in quantity and quality to children's input. Taking this approach to the limit, Vong et al. (2024) showed that a multimodal neural network trained on 61 hours of visual and linguistic input extracted from just one child's developmental experience could acquire word-referent mappings. However, whether this approach's success reflects the idiosyncrasies of a single child's experience, or whether it would show consistent and robust learning patterns across multiple children's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChild and Animal Learning Development · Language Development and Disorders · Multimodal Machine Learning Applications