# Recurrent issues with deep neural network models of visual recognition

**Authors:** Timothee Maniquet, Hans Op de Beeck, Andrea Ivan Costantino

PMC · DOI: 10.1038/s41598-025-20245-w · Scientific Reports · 2025-10-17

## TL;DR

This study compares recurrent and feedforward deep neural networks in modeling human visual recognition behavior, finding that model size often matters more than recurrence.

## Contribution

The study reveals nuanced relationships between model size, recurrence, and human-like performance in visual recognition tasks.

## Key findings

- Performance improvements were primarily driven by model size, not recurrence or architecture.
- Larger models aligned better with human difficulty patterns across visual manipulations.
- Recurrent models showed a negative size effect in matching human confusion matrices.

## Abstract

Object recognition requires flexible and robust information processing, especially in view of the challenges posed by naturalistic visual settings. The ventral stream in visual cortex is provided with this robustness by its recurrent connectivity. Recurrent deep neural networks (DNNs) have recently emerged as promising models of the ventral stream, surpassing feedforward DNNs in the ability to account for brain representations. In this study, we asked whether recurrent DNNs could also better account for human behaviour during visual recognition. We assembled a stimulus set that includes manipulations that are often associated with recurrent processing in the literature, like occlusion, partial viewing, clutter, and spatial phase scrambling. We obtained a benchmark dataset from human participants performing a categorisation task on this stimulus set. By applying a wide range of model architectures to the same task, we uncovered a nuanced relationship between recurrence, model size, and performance. First, results show that increases in performance were most strongly linked to increases in model size, with architecture seemingly not playing a role, even for more challenging manipulations. Second, we found larger models to be more consistent with humans on which manipulations they found more difficult, regardless of model architecture. Finally, we found a negative effect of size in matching human confusion matrices in recurrent but not feedforward DNNs. Contrary to previous assumptions, our findings challenge the notion that recurrent models are better models of human recognition behaviour than feedforward models, and emphasise the complexity of incorporating recurrence into computational models.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12534488/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12534488/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12534488/full.md

---
Source: https://tomesphere.com/paper/PMC12534488