ImageNet-trained CNNs are biased towards texture; increasing shape bias   improves accuracy and robustness

Robert Geirhos; Patricia Rubisch; Claudio Michaelis; Matthias Bethge,; Felix A. Wichmann; Wieland Brendel

arXiv:1811.12231·cs.CV·November 11, 2022·664 cites

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge,, Felix A. Wichmann, Wieland Brendel

PDF

Open Access 5 Repos 1 Models 5 Datasets

TL;DR

This study reveals that standard CNNs are biased towards textures over shapes, but training on stylized images shifts their bias towards shapes, improving accuracy and robustness, aligning better with human perception.

Contribution

The paper demonstrates that CNNs trained on Stylized-ImageNet develop a shape bias similar to humans, leading to enhanced performance and robustness compared to texture-biased models.

Findings

01

CNNs are biased towards textures rather than shapes.

02

Training on Stylized-ImageNet shifts CNNs to a shape bias.

03

Shape-biased CNNs show improved robustness and object detection.

Abstract

Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on "Stylized-ImageNet", a stylized version of ImageNet. This provides a much better fit for human behavioural performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
flashingtt/imagenet-r
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception

MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling