How much human-like visual experience do current self-supervised learning algorithms need in order to achieve human-level object recognition?
A. Emin Orhan

TL;DR
This study estimates that current self-supervised visual learning algorithms require vastly more natural visual experience than humans to reach human-level object recognition, highlighting significant gaps in data efficiency.
Contribution
The paper provides the first quantitative estimates of the amount of natural visual experience needed for algorithms to match human performance, revealing it is orders of magnitude greater than a human lifetime.
Findings
Algorithms need millions to billions of years of visual experience to reach human-level performance.
Estimated experience requirements are much larger for robustness benchmarks.
Results are sensitive to underlying assumptions but remain significantly above human lifetime.
Abstract
This paper addresses a fundamental question: how good are our current self-supervised visual representation learning algorithms relative to humans? More concretely, how much "human-like" natural visual experience would these algorithms need in order to reach human-level performance in a complex, realistic visual object recognition task such as ImageNet? Using a scaling experiment, here we estimate that the answer is several orders of magnitude longer than a human lifetime: typically on the order of a million to a billion years of natural visual experience (depending on the algorithm used). We obtain even larger estimates for achieving human-level performance in ImageNet-derived robustness benchmarks. The exact values of these estimates are sensitive to some underlying assumptions, however even in the most optimistic scenarios they remain orders of magnitude larger than a human lifetime.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Remote-Sensing Image Classification · Image Processing Techniques and Applications
