Adversarial robustness via stochastic regularization of neural activation sensitivity
Gil Fidel, Ron Bitton, Ziv Katzir, Asaf Shabtai

TL;DR
This paper introduces a stochastic regularization method that reduces neural activation sensitivity and pushes decision boundaries away from data points, enhancing adversarial robustness against various attack strategies.
Contribution
It proposes a novel regularization technique that simultaneously flattens loss gradients and increases boundary margins, addressing two key adversarial defense goals.
Findings
Outperforms previous defenses in empirical tests
Effective against adaptive adversarial attacks
Theoretically grounded approach
Abstract
Recent works have shown that the input domain of any machine learning classifier is bound to contain adversarial examples. Thus we can no longer hope to immune classifiers against adversarial examples and instead can only aim to achieve the following two defense goals: 1) making adversarial examples harder to find, or 2) weakening their adversarial nature by pushing them further away from correctly classified data points. Most if not all the previously suggested defense mechanisms attend to just one of those two goals, and as such, could be bypassed by adaptive attacks that take the defense mechanism into consideration. In this work we suggest a novel defense mechanism that simultaneously addresses both defense goals: We flatten the gradients of the loss surface, making adversarial examples harder to find, using a novel stochastic regularization term that explicitly decreases the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security
