Theoretical Understanding of Learning from Adversarial Perturbations

Soichiro Kumano; Hiroshi Kera; Toshihiko Yamasaki

arXiv:2402.10470·cs.LG·February 19, 2024·1 cites

Theoretical Understanding of Learning from Adversarial Perturbations

Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper develops a theoretical framework to understand how adversarial perturbations contain class features and influence neural network generalization, revealing that even minimal perturbations can encode sufficient class information.

Contribution

It introduces a theoretical model demonstrating that adversarial perturbations include class features and affect decision boundaries similarly to standard samples.

Findings

01

Perturbations of a few pixels contain enough class features for generalization.

02

The decision boundary from learning with perturbations closely matches that from standard samples.

03

Theoretical insights explain transferability and deception of adversarial examples.

Abstract

It is not fully understood why adversarial examples can deceive neural networks and transfer between different networks. To elucidate this, several studies have hypothesized that adversarial perturbations, while appearing as noises, contain class features. This is supported by empirical evidence showing that networks trained on mislabeled adversarial examples can still generalize well to correctly labeled test samples. However, a theoretical understanding of how perturbations include class features and contribute to generalization is limited. In this study, we provide a theoretical framework for understanding learning from perturbations using a one-hidden-layer network trained on mutually orthogonal samples. Our results highlight that various adversarial perturbations, even perturbations of a few pixels, contain sufficient class features for generalization. Moreover, we reveal that the…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

Theoretical understanding of how neural networks learn in the presence of adversarial perturbations is a challenging but important task. The paper is technically solid, with clearly presented assumptions and theoretical results. Theoretical results are accompanied by adequate discussions and high-level explanations of the proof idea. The considered setting where adversarial perturbations are added to uniform/Gaussian noise is new, which nicely supports the paper's key argument that generated adv

Weaknesses

While I appreciate the theoretical nature of this paper, the motivation and some considered settings for studying adversarial perturbations in the context of learning need to be more convincing from my perspective. In the abstract, the paper claims that the phenomenon “neural networks trained on mislabeled samples with adversarial perturbations can generalize to natural test data” is counter-intuitive. I do not understand why this is a counter-intuitive phenomenon and how this phenomenon motiva

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- The idea to apply the implicit bias results to learning from adversarial perturbations is novel. - The theoretical results admit interesting and relevant interpretations (eg. effect of learning misslabled data vs perturbation data).

Weaknesses

- It is unclear what the motivation for studying the uniform perturbation model is. - Overall, the investigated model seems to be quite simple and theoretical assumptions restrictive (see also questions below). - The text is rather densely written (e.g. subsection 4.2 and 4.3), which makes understanding main statements and insights rather difficult (e.g. Theorem 4.2).

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

This paper provides a theoretical justification regarding an empirical phenomenon that first observed in Ilyas et a. 2019. To the best of my knowledge there’s no theoretical work has been focused on this direction before.

Weaknesses

I’m not exactly following the motivation of this work. Normally when people consider adversarial training to gain robustness, at each iteration the adversarial training examples is generated based on the current model weight, yet in this paper, from the definition 3.2, it seems that the adversarial examples of training samples is generated beforehand and independent of the current model weight, and it’s fixed during the latter training procedure. Therefore it’s unclear to me whether the perturba

Code & Models

Repositories

s-kumano/learning-from-adversarial-perturbations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning