Understanding Robust Learning through the Lens of Representation Similarities
Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Y. Zhao,, Prateek Mittal, Haitao Zheng

TL;DR
This paper investigates how representations learned by robust neural networks differ from standard ones, revealing key properties and challenges that inform future robust model design.
Contribution
It introduces a comprehensive analysis of robust versus non-robust representations using similarity metrics across multiple datasets, uncovering novel properties of robust networks.
Findings
Robust networks lack representation specialization.
Disappearance of block structure in robust representations.
Overfitting mainly affects deeper layers during robust training.
Abstract
Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
