On the Properties of Adversarially-Trained CNNs
Mattia Carletti, Matteo Terzi, Gian Antonio Susto

TL;DR
This paper investigates the fundamental properties of adversarially-trained CNNs, revealing surprising behaviors, limitations, and failure modes, supported by extensive analysis across various architectures and datasets.
Contribution
It provides new insights into how adversarial training confers robustness, highlighting previously unrecognized properties and limitations of robust models.
Findings
Adversarially-trained models exhibit unexpected properties.
Limitations and failure modes of robust models are identified.
Deep analysis compares robust and natural models across architectures.
Abstract
Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures. Despite many efforts, explanations of the foundational principles underpinning the effectiveness of Adversarial Training are limited and far from being widely accepted by the Deep Learning community. In this paper, we describe surprising properties of adversarially-trained models, shedding light on mechanisms through which robustness against adversarial attacks is implemented. Moreover, we highlight limitations and failure modes affecting these models that were not discussed by prior works. We conduct extensive analyses on a wide range of architectures and datasets, performing a deep comparison between robust and natural models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
