Theoretical Understanding of Learning from Adversarial Perturbations
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

TL;DR
This paper develops a theoretical framework to understand how adversarial perturbations contain class features and influence neural network generalization, revealing that even minimal perturbations can encode sufficient class information.
Contribution
It introduces a theoretical model demonstrating that adversarial perturbations include class features and affect decision boundaries similarly to standard samples.
Findings
Perturbations of a few pixels contain enough class features for generalization.
The decision boundary from learning with perturbations closely matches that from standard samples.
Theoretical insights explain transferability and deception of adversarial examples.
Abstract
It is not fully understood why adversarial examples can deceive neural networks and transfer between different networks. To elucidate this, several studies have hypothesized that adversarial perturbations, while appearing as noises, contain class features. This is supported by empirical evidence showing that networks trained on mislabeled adversarial examples can still generalize well to correctly labeled test samples. However, a theoretical understanding of how perturbations include class features and contribute to generalization is limited. In this study, we provide a theoretical framework for understanding learning from perturbations using a one-hidden-layer network trained on mutually orthogonal samples. Our results highlight that various adversarial perturbations, even perturbations of a few pixels, contain sufficient class features for generalization. Moreover, we reveal that the…
Peer Reviews
Decision·ICLR 2024 poster
Theoretical understanding of how neural networks learn in the presence of adversarial perturbations is a challenging but important task. The paper is technically solid, with clearly presented assumptions and theoretical results. Theoretical results are accompanied by adequate discussions and high-level explanations of the proof idea. The considered setting where adversarial perturbations are added to uniform/Gaussian noise is new, which nicely supports the paper's key argument that generated adv
While I appreciate the theoretical nature of this paper, the motivation and some considered settings for studying adversarial perturbations in the context of learning need to be more convincing from my perspective. In the abstract, the paper claims that the phenomenon “neural networks trained on mislabeled samples with adversarial perturbations can generalize to natural test data” is counter-intuitive. I do not understand why this is a counter-intuitive phenomenon and how this phenomenon motiva
- The idea to apply the implicit bias results to learning from adversarial perturbations is novel. - The theoretical results admit interesting and relevant interpretations (eg. effect of learning misslabled data vs perturbation data).
- It is unclear what the motivation for studying the uniform perturbation model is. - Overall, the investigated model seems to be quite simple and theoretical assumptions restrictive (see also questions below). - The text is rather densely written (e.g. subsection 4.2 and 4.3), which makes understanding main statements and insights rather difficult (e.g. Theorem 4.2).
This paper provides a theoretical justification regarding an empirical phenomenon that first observed in Ilyas et a. 2019. To the best of my knowledge there’s no theoretical work has been focused on this direction before.
I’m not exactly following the motivation of this work. Normally when people consider adversarial training to gain robustness, at each iteration the adversarial training examples is generated based on the current model weight, yet in this paper, from the definition 3.2, it seems that the adversarial examples of training samples is generated beforehand and independent of the current model weight, and it’s fixed during the latter training procedure. Therefore it’s unclear to me whether the perturba
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
