Partial success in closing the gap between human and machine vision
Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Tizian, Thieringer, Matthias Bethge, Felix A. Wichmann, Wieland Brendel

TL;DR
This study evaluates progress in closing the gap between human and machine vision by testing models and humans on diverse out-of-distribution datasets, showing models now surpass humans in robustness but still differ in error patterns.
Contribution
It provides comprehensive human behavioral data on OOD datasets and assesses various modern models, revealing recent advances and remaining differences in human-machine visual perception.
Findings
Models now exceed humans in robustness on OOD datasets.
Humans and models differ in error patterns, with models showing higher error agreement.
Increasing training data size improves human-model behavioral alignment.
Abstract
A few years ago, the first CNN surpassed human performance on ImageNet. However, it soon became clear that machines lack robustness on more challenging test cases, a major obstacle towards deploying machines "in the wild" and towards obtaining better computational models of human visual perception. Here we ask: Are we making progress in closing the gap between human and machine vision? To answer this question, we tested human observers on a broad range of out-of-distribution (OOD) datasets, recording 85,120 psychophysical trials across 90 participants. We then investigated a range of promising machine learning developments that crucially deviate from standard supervised CNNs along three axes: objective function (self-supervised, adversarially trained, CLIP language-image training), architecture (e.g. vision transformers), and dataset size (ranging from 1M to 1B). Our findings are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Face Recognition and Perception
MethodsContrastive Language-Image Pre-training
