Self-supervised learning through the eyes of a child
A. Emin Orhan, Vaibhav V. Gupta, Brenden M. Lake

TL;DR
This paper investigates how self-supervised deep learning models can develop high-level visual representations from natural egocentric videos of children, shedding light on early visual development and the role of innate biases.
Contribution
It demonstrates that generic self-supervised learning on realistic videos can produce high-level visual representations similar to those in children, advancing understanding of visual development.
Findings
Powerful visual representations emerge from natural videos.
Self-supervised learning models can mimic early visual development.
Results support the sufficiency of generic learning mechanisms.
Abstract
Within months of birth, children develop meaningful expectations about the world around them. How much of this early knowledge can be explained through generic learning mechanisms applied to sensory data, and how much of it requires more substantive innate inductive biases? Addressing this fundamental question in its full generality is currently infeasible, but we can hope to make real progress in more narrowly defined domains, such as the development of high-level visual categories, thanks to improvements in data collecting technology and recent progress in deep learning. In this paper, our goal is precisely to achieve such progress by utilizing modern self-supervised deep learning methods and a recent longitudinal, egocentric video dataset recorded from the perspective of three young children (Sullivan et al., 2020). Our results demonstrate the emergence of powerful, high-level visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInnovative Teaching and Learning Methods · Education and Critical Thinking Development · Child and Animal Learning Development
