Adversarial Robustness as a Prior for Learned Representations
Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras,, Brandon Tran, Aleksander Madry

TL;DR
This paper demonstrates that adversarial robustness in deep learning models acts as a prior that enhances the quality and interpretability of learned feature representations, making them more invertible and manipulable.
Contribution
It shows that robust optimization enforces priors on features, leading to more high-level, invertible, and interpretable representations in neural networks.
Findings
Robust models produce more invertible representations.
Robust representations allow visualization and manipulation of input features.
Adversarial robustness improves the quality of learned features.
Abstract
An important goal in deep learning is to learn versatile, high-level feature representations of input data. However, standard networks' representations seem to possess shortcomings that, as we illustrate, prevent them from fully realizing this goal. In this work, we show that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks. It turns out that representations learned by robust models address the aforementioned shortcomings and make significant progress towards learning a high-level encoding of inputs. In particular, these representations are approximately invertible, while allowing for direct visualization and manipulation of salient input features. More broadly, our results indicate adversarial robustness as a promising avenue for improving learned representations. Our code and models for reproducing these results is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
