Understanding the Logit Distributions of Adversarially-Trained Deep   Neural Networks

Landan Seguin; Anthony Ndirango; Neeli Mishra; SueYeon Chung; Tyler; Lee

arXiv:2108.12001·cs.LG·August 30, 2021·1 cites

Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks

Landan Seguin, Anthony Ndirango, Neeli Mishra, SueYeon Chung, Tyler, Lee

PDF

Open Access

TL;DR

This paper investigates the distribution of logits in adversarially-trained neural networks, revealing key characteristics and differences from standard models that are crucial for understanding how robustness is learned.

Contribution

It provides a theoretical and empirical analysis of logit distributions in adversarial training, identifying essential features like logit gaps and confidence patterns for robustness.

Findings

01

Adversarial training reduces max logit values and logit gaps.

02

AT models differ significantly in confidence levels on samples.

03

Learning about incorrect class distributions is vital for robustness.

Abstract

Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Almost all defense strategies achieve this invariance through adversarial training i.e. training on inputs with adversarial perturbations. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood. Motivated by a recent study on learning robustness without input perturbations by distilling an AT model, we explore what is learned during adversarial training by analyzing the distribution of logits in AT models. We identify three logit characteristics essential to learning adversarial robustness. First, we provide a theoretical justification for the finding that adversarial training shrinks two important characteristics of the logit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Bacillus and Francisella bacterial research