Adversarial Training from Mean Field Perspective
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

TL;DR
This paper provides a theoretical analysis of adversarial training in deep neural networks using mean field theory, revealing factors affecting trainability and robustness.
Contribution
It introduces a novel mean field framework for adversarial training analysis, deriving bounds and identifying conditions impacting network robustness.
Findings
Networks without shortcuts are not adversarially trainable.
Adversarial training reduces network capacity.
Wider networks mitigate training difficulties.
Abstract
Although adversarial training is known to be effective against adversarial examples, training dynamics are not well understood. In this study, we present the first theoretical analysis of adversarial training in random deep neural networks without any assumptions on data distributions. We introduce a new theoretical framework based on mean field theory, which addresses the limitations of existing mean field-based approaches. Based on this framework, we derive (empirically tight) upper bounds of norm-based adversarial loss with norm-based adversarial examples for various values of and . Moreover, we prove that networks without shortcuts are generally not adversarially trainable and that adversarial training reduces network capacity. We also show that network width alleviates these issues. Furthermore, we present the various impacts of the input and output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
