Assessing the alignment between infants' visual and linguistic experience using multimodal language models
Alvin Wei Ming Tan, Jane Yang, Tarun Sepuri, Khai Loong Aw, Robert Z. Sparks, Zi Yin, Virginia A. Marchman, Michael C. Frank, Bria Long

TL;DR
This study uses CLIP models to automatically analyze the alignment of visual and linguistic experiences in infants' everyday environments, revealing infrequent but critical moments for early word learning.
Contribution
It introduces a novel automated method using CLIP to assess vision-language alignment in infant videos, addressing limitations of manual annotation and providing new insights into early language acquisition.
Findings
Aligned moments are rare in infants' natural environments.
Variability in alignment exists both within and across children.
The method offers a new way to study multimodal learning environments.
Abstract
Figuring out which objects or concepts words refer to is a central language learning challenge for young children. Most models of this process posit that children learn early object labels from co-occurrences of words and their referents that occur when someone around them talks about an object in the immediate physical environment. But how aligned in time are children's visual and linguistic experiences during everyday learning? To date, answers to this question have been limited by the need for labor-intensive manual annotations of vision-language co-occurrences. Here, we evaluate the use of contrastive language-image pretraining (CLIP) models to automatically characterize vision-language alignment in egocentric videos taken from the infant perspective in home environments. After validating CLIP alignment scores using human alignment judgments, we apply this metric to a large corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage Development and Disorders · Child and Animal Learning Development · Categorization, perception, and language
