Interpreting Adversarially Trained Convolutional Neural Networks

Tianyuan Zhang; Zhanxing Zhu

arXiv:1905.09797·cs.LG·May 24, 2019·75 cites

Interpreting Adversarially Trained Convolutional Neural Networks

Tianyuan Zhang, Zhanxing Zhu

PDF

Open Access 1 Repo

TL;DR

This paper investigates how adversarial training influences CNNs' object recognition, revealing that it shifts models from texture bias towards shape bias, which enhances robustness and interpretability.

Contribution

It introduces systematic methods to interpret AT-CNNs, showing they learn more shape-biased representations compared to standard CNNs.

Findings

01

Adversarial training reduces texture bias in CNNs.

02

AT-CNNs are more sensitive to shape features than standard CNNs.

03

Shape-biased models are more robust to transformations and dataset manipulations.

Abstract

We attempt to interpret how adversarially trained convolutional neural networks (AT-CNNs) recognize objects. We design systematic approaches to interpret AT-CNNs in both qualitative and quantitative ways and compare them with normally trained models. Surprisingly, we find that adversarial training alleviates the texture bias of standard CNNs when trained on object recognition tasks, and helps CNNs learn a more shape-biased representation. We validate our hypothesis from two aspects. First, we compare the salience maps of AT-CNNs and standard CNNs on clean images and images under different transformations. The comparison could visually show that the prediction of the two types of CNNs is sensitive to dramatically different types of features. Second, to achieve quantitative verification, we construct additional test datasets that destroy either textures or shapes, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PKUAI26/AT-CNN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning