Can Implicit Bias Imply Adversarial Robustness?
Hancheng Min, Ren\'e Vidal

TL;DR
This paper investigates how the implicit bias of gradient-based training influences adversarial robustness, showing that certain architectures with polynomial ReLU activations can achieve both good generalization and robustness.
Contribution
It extends analysis of neuron alignment to demonstrate that shallow networks with polynomial ReLU activations trained by gradient flow are both generalizable and adversarially robust.
Findings
Shallow networks with pReLU can be robust to adversarial attacks.
Implicit bias affects the robustness of neural networks.
Data structure and architecture design interplay influences robustness.
Abstract
The implicit bias of gradient-based training algorithms has been considered mostly beneficial as it leads to trained networks that often generalize well. However, Frei et al. (2023) show that such implicit bias can harm adversarial robustness. Specifically, they show that if the data consists of clusters with small inter-cluster correlation, a shallow (two-layer) ReLU network trained by gradient flow generalizes well, but it is not robust to adversarial attacks of small radius. Moreover, this phenomenon occurs despite the existence of a much more robust classifier that can be explicitly constructed from a shallow network. In this paper, we extend recent analyses of neuron alignment to show that a shallow network with a polynomial ReLU activation (pReLU) trained by gradient flow not only generalizes well but is also robust to adversarial attacks. Our results highlight the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
