TL;DR
This paper introduces Jacobian regularization as a method to significantly improve neural network robustness against universal adversarial perturbations, enabling practical defenses without losing accuracy.
Contribution
The work derives bounds on UAP effectiveness using Jacobian norms and empirically demonstrates that Jacobian regularization enhances robustness fourfold while preserving clean performance.
Findings
Jacobian regularization increases robustness to UAPs by up to four times.
A new metric correlates strongly with actual robustness against shared perturbations.
Practical universal attacks can be mitigated without sacrificing clean accuracy.
Abstract
Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
