Wide Two-Layer Networks can Learn from Adversarial Perturbations

Soichiro Kumano; Hiroshi Kera; Toshihiko Yamasaki

arXiv:2410.23677·cs.LG·January 22, 2025

Wide Two-Layer Networks can Learn from Adversarial Perturbations

Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

PDF

Open Access 1 Repo 1 Reviews

TL;DR

This paper provides a theoretical explanation for why wide two-layer neural networks can learn from adversarial perturbations, showing they contain enough class-specific features for effective generalization.

Contribution

It proves that adversarial perturbations include sufficient class-specific features, explaining the success of perturbation learning in wide two-layer networks for any data distribution.

Findings

01

Adversarial perturbations contain class-specific features.

02

Classifiers trained on adversarial examples match those trained on clean data.

03

Results hold for any data distribution.

Abstract

Adversarial examples have raised several open questions, such as why they can deceive classifiers and transfer between different models. A prevailing hypothesis to explain these phenomena suggests that adversarial perturbations appear as random noise but contain class-specific features. This hypothesis is supported by the success of perturbation learning, where classifiers trained solely on adversarial examples and the corresponding incorrect labels generalize well to correctly labeled test data. Although this hypothesis and perturbation learning are effective in explaining intriguing properties of adversarial examples, their solid theoretical foundation is limited. In this study, we theoretically explain the counterintuitive success of perturbation learning. We assume wide two-layer networks and the results hold for any data distribution. We prove that adversarial perturbations contain…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 5Confidence 3

Strengths

1. The problem is a very interesting one and lacks more analysis, so this paper addresses a significant problem in an original way. 2. The setting and assumptions are very well described and clear.

Weaknesses

1. I am overall confused about the choice of kernel regime to explain perturbation learning. As the authors acknowledge, there is no feature learning for the choice of width in this paper (the output of hidden units remains the same). It's not clear how such a framework can explain perturbation learning and, in particular, the "feature hypothesis," which authors claim that they do. Since there is no feature learning in this regime, there should not be any "feature hypothesis." This is not to say

Code & Models

Repositories

s-kumano/perturbation-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Network Security and Intrusion Detection