Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition
Saeed Reza Kheradpisheh, Masoud Ghodrati, Mohammad Ganjtabesh,, Timoth\'ee Masquelier

TL;DR
This study benchmarks deep neural networks against human performance in view-invariant object recognition, revealing that deeper models can match or surpass humans when facing large viewpoint variations, with some shallow models outperforming deep ones under small variations.
Contribution
It systematically compares multiple DCNNs and models to human performance across varying viewpoint variations, highlighting the importance of depth for human-like recognition.
Findings
Deeper networks perform better with larger viewpoint variations.
Shallow models can outperform deep models with small variations.
Very deep networks can surpass human performance at high viewpoint variations.
Abstract
Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
