Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks
Landan Seguin, Anthony Ndirango, Neeli Mishra, SueYeon Chung, Tyler, Lee

TL;DR
This paper investigates the distribution of logits in adversarially-trained neural networks, revealing key characteristics and differences from standard models that are crucial for understanding how robustness is learned.
Contribution
It provides a theoretical and empirical analysis of logit distributions in adversarial training, identifying essential features like logit gaps and confidence patterns for robustness.
Findings
Adversarial training reduces max logit values and logit gaps.
AT models differ significantly in confidence levels on samples.
Learning about incorrect class distributions is vital for robustness.
Abstract
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Almost all defense strategies achieve this invariance through adversarial training i.e. training on inputs with adversarial perturbations. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood. Motivated by a recent study on learning robustness without input perturbations by distilling an AT model, we explore what is learned during adversarial training by analyzing the distribution of logits in AT models. We identify three logit characteristics essential to learning adversarial robustness. First, we provide a theoretical justification for the finding that adversarial training shrinks two important characteristics of the logit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Bacillus and Francisella bacterial research
