Partial success in closing the gap between human and machine vision

Robert Geirhos; Kantharaju Narayanappa; Benjamin Mitzkus; Tizian; Thieringer; Matthias Bethge; Felix A. Wichmann; Wieland Brendel

arXiv:2106.07411·cs.CV·October 26, 2021·63 cites

Partial success in closing the gap between human and machine vision

Robert Geirhos, Kantharaju Narayanappa, Benjamin Mitzkus, Tizian, Thieringer, Matthias Bethge, Felix A. Wichmann, Wieland Brendel

PDF

Open Access 1 Repo 1 Video

TL;DR

This study evaluates progress in closing the gap between human and machine vision by testing models and humans on diverse out-of-distribution datasets, showing models now surpass humans in robustness but still differ in error patterns.

Contribution

It provides comprehensive human behavioral data on OOD datasets and assesses various modern models, revealing recent advances and remaining differences in human-machine visual perception.

Findings

01

Models now exceed humans in robustness on OOD datasets.

02

Humans and models differ in error patterns, with models showing higher error agreement.

03

Increasing training data size improves human-model behavioral alignment.

Abstract

A few years ago, the first CNN surpassed human performance on ImageNet. However, it soon became clear that machines lack robustness on more challenging test cases, a major obstacle towards deploying machines "in the wild" and towards obtaining better computational models of human visual perception. Here we ask: Are we making progress in closing the gap between human and machine vision? To answer this question, we tested human observers on a broad range of out-of-distribution (OOD) datasets, recording 85,120 psychophysical trials across 90 participants. We then investigated a range of promising machine learning developments that crucially deviate from standard supervised CNNs along three axes: objective function (self-supervised, adversarially trained, CLIP language-image training), architecture (e.g. vision transformers), and dataset size (ranging from 1M to 1B). Our findings are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bethgelab/model-vs-human
pytorchOfficial

Videos

Partial success in closing the gap between human and machine vision· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Face Recognition and Perception

MethodsContrastive Language-Image Pre-training