Robust Models are less Over-Confident
Julia Grabinski, Paul Gavrikov, Janis Keuper, Margret Keuper

TL;DR
This paper empirically demonstrates that adversarial training not only improves robustness against attacks but also reduces overconfidence in model predictions, influenced by model components like activation functions.
Contribution
It provides a comprehensive analysis of how adversarial training affects model confidence and the impact of model architecture choices on this behavior.
Findings
Adversarial training reduces overconfidence in models.
Model components like activation functions influence prediction confidence.
Robust models maintain accuracy while being less overconfident.
Abstract
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
