Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
Xueyi Ke, Satoshi Tsutsui, Yayun Zhang, Bihan Wen

TL;DR
This study investigates whether a computational model mimicking infant learning can develop broader visual concepts beyond its linguistic vocabulary, offering insights into early visual development and advancing computer vision models.
Contribution
It analyzes internal representations of an infant-trained model to identify hidden visual concepts and compares these with modern computer vision models, bridging cognitive science and AI.
Findings
Neurons recognizing objects beyond the model's vocabulary
Identification of hidden visual concept neurons in the model
Differences in representations between infant and modern models
Abstract
Infants develop complex visual understanding rapidly, even preceding the acquisition of linguistic skills. As computer vision seeks to replicate the human vision system, understanding infant visual development may offer valuable insights. In this paper, we present an interdisciplinary study exploring this question: can a computational model that imitates the infant learning process develop broader visual concepts that extend beyond the vocabulary it has heard, similar to how infants naturally learn? To investigate this, we analyze a recently published model in Science by Vong et al., which is trained on longitudinal, egocentric images of a single child paired with transcribed parental speech. We perform neuron labeling to identify visual concept neurons hidden in the model's internal representations. We then demonstrate that these neurons can recognize objects beyond the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual and Cognitive Learning Processes · Spatial Cognition and Navigation · Science Education and Pedagogy
MethodsContrastive Language-Image Pre-training
