Humans and deep networks largely agree on which kinds of variation make object recognition harder
Saeed Reza Kheradpisheh, Masoud Ghodrati, Mohammad Ganjtabesh,, Timoth\'ee Masquelier

TL;DR
This study compares human and deep neural network object recognition under various transformations, revealing significant similarities in their difficulty patterns and suggesting shared underlying mechanisms.
Contribution
First systematic comparison of human and DCNN view-invariant object recognition using identical images and controlled transformations.
Findings
Humans and DCNNs agree on the difficulty hierarchy of transformations.
Rotation in depth is the most challenging variation for both.
Variation levels in rotation in depth and scale significantly affect recognition performance.
Abstract
View-invariant object recognition is a challenging problem, which has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best algorithms for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition using the same images and controlling for both the kinds of transformation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
