Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video

Zekai Shi; Zhixi Cai; Kalin Stefanov

arXiv:2511.11725·cs.CV·November 18, 2025

Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video

Zekai Shi, Zhixi Cai, Kalin Stefanov

PDF

Open Access

TL;DR

This study introduces a biologically inspired masking strategy for visual representation learning, improving word-referent mapping in infant-like scenarios using egocentric video data.

Contribution

It proposes a novel masking approach based on the human blind spot, enhancing self-supervised visual learning for language acquisition models.

Findings

01

Biologically plausible masking performs comparably to random masking.

02

The approach effectively learns word-referent mappings from naturalistic video data.

03

The method aligns with human visual processing mechanisms.

Abstract

Typically, children start to learn their first words between 6 and 9 months, linking spoken utterances to their visual referents. Without prior knowledge, a word encountered for the first time can be interpreted in countless ways; it might refer to any of the objects in the environment, their components, or attributes. Using longitudinal, egocentric, and ecologically valid data from the experience of one child, in this work, we propose a self-supervised and biologically plausible strategy to learn strong visual representations. Our masked autoencoder-based visual backbone incorporates knowledge about the blind spot in human eyes to define a novel masking strategy. This mask and reconstruct approach attempts to mimic the way the human brain fills the gaps in the eyes' field of view. This represents a significant shift from standard random masking strategies, which are difficult to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChild and Animal Learning Development · Multimodal Machine Learning Applications · Language Development and Disorders