ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge,, Felix A. Wichmann, Wieland Brendel

TL;DR
This study reveals that standard CNNs are biased towards textures over shapes, but training on stylized images shifts their bias towards shapes, improving accuracy and robustness, aligning better with human perception.
Contribution
The paper demonstrates that CNNs trained on Stylized-ImageNet develop a shape bias similar to humans, leading to enhanced performance and robustness compared to texture-biased models.
Findings
CNNs are biased towards textures rather than shapes.
Training on Stylized-ImageNet shifts CNNs to a shape bias.
Shape-biased CNNs show improved robustness and object detection.
Abstract
Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on "Stylized-ImageNet", a stylized version of ImageNet. This provides a much better fit for human behavioural performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception
MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling
