# Behavioral differences between humans and machines arise early in visual processing

**Authors:** Thomas Klein, Wieland Brendel, Felix A. Wichmann

PMC · DOI: 10.1167/jov.26.2.9 · Journal of Vision · 2026-02-17

## TL;DR

This study shows that differences in visual recognition between humans and deep neural networks arise early in processing, not due to late-stage reasoning.

## Contribution

The study demonstrates that behavioral differences between DNNs and humans in visual tasks are not due to late processing but early systematic differences.

## Key findings

- Error consistency between DNNs and humans never exceeded 0.4, even at short presentation times.
- Differences in visual processing between DNNs and humans are not explained by late high-level reasoning.
- Behavioral differences arise early in the visual processing pipeline.

## Abstract

It remains an open question to what extent current deep neural networks (DNNs) are suitable computational models of the human visual system. While DNNs have proven to be capable of predicting neural activations in primate visual cortex with great success, psychophysical experiments have shown behavioral differences between DNNs and human observers. One of these behavioral differences is which individual images DNNs and human observers find easy or difficult to recognize, as quantified by error consistency (EC). Hypothetically, the reported differences in EC could arise late in visual processing, even though the representations extracted by DNNs and human observers may have been more similar in the initial forward sweep: At the presentation and response times investigated in earlier work, observer-internal idiosyncrasies (e.g., in feedback-mediated memory) might have influenced the final behavioral responses, lowering EC between DNNs and human observers. To test this hypothesis, we systematically vary presentation times of backward-masked stimuli from 8.3 to 267 ms and measure human performance on a speeded eightfold identification task with natural images. Contrary to the hypothesis that error consistency peaks early in time, we find that it never exceeds the value of 0.4 known from previous work with longer presentation times, suggesting that the differences between DNNs and humans cannot be explained by late high-level reasoning but point to systematic processing differences between DNNs and the early human visual system.

## Full-text entities

- **Chemicals:** EC (-)
- **Species:** Salmonella phage IKe (no rank) [taxon 10867], Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090], Canis lupus familiaris (dog, subspecies) [taxon 9615]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12922715/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12922715/full.md

## References

91 references — full list in the complete paper: https://tomesphere.com/paper/PMC12922715/full.md

---
Source: https://tomesphere.com/paper/PMC12922715