Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks
Ramin Barati, Reza Safabakhsh, Mohammad Rahmati

TL;DR
This paper investigates the existence of adversarial examples in neural networks and proposes a theoretical framework based on convergence concepts to explain their phenomenon, unifying and extending previous explanations.
Contribution
It introduces a convergence-based theoretical explanation for adversarial examples and training, connecting attack objectives with learning theory, and demonstrates its practical applicability.
Findings
Pointwise convergence relates to adversarial phenomena.
The framework unifies existing explanations.
Experimental results validate the theory's relevance.
Abstract
In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our proposal is that it relates the objective of the evasion attacks and adversarial training with concepts already defined in learning theory. Also, we extend and unify some of the other proposals in the literature and provide alternative explanations on the observations made in those proposals. Through different experiments, we demonstrate that the framework is valuable in the study of the phenomenon and is applicable to real-world problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
