Gradient Methods Provably Converge to Non-Robust Networks
Gal Vardi, Gilad Yehudai, Ohad Shamir

TL;DR
This paper proves that gradient flow training of depth-2 ReLU networks naturally leads to non-robust models susceptible to small adversarial perturbations, due to an implicit bias towards margin maximization.
Contribution
It demonstrates that the implicit bias in gradient flow training favors non-robust networks, even when robust solutions exist, revealing a fundamental reason for adversarial vulnerability.
Findings
Gradient flow leads to non-robust networks in certain settings.
Networks satisfying max-margin KKT conditions are non-robust.
Implicit bias towards margin maximization causes vulnerability.
Abstract
Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth- ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial -perturbations), even when robust networks that classify the training dataset correctly exist. Perhaps surprisingly, we show that the well-known implicit bias towards margin maximization induces bias towards non-robust networks, by proving that every network which satisfies the KKT conditions of the max-margin problem is non-robust.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · COVID-19 diagnosis using AI
