GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks
Nicholas Rober, Alex Rose, Jonathan P. How

TL;DR
GUARDIAN is a safety filtering framework that guarantees safety for systems with neural network-based state estimators under adversarial attacks by providing formal bounds and adjusting control inputs accordingly.
Contribution
This paper introduces GUARDIAN, a novel safety filtering method that offers formal safety guarantees for systems with NN estimators under adversarial conditions.
Findings
GUARDIAN effectively defends against adversarial attacks.
It provides formal safety guarantees using neural network verification.
Numerical experiments validate its robustness and effectiveness.
Abstract
Safety filtering is an effective method for enforcing constraints in safety-critical systems, but existing methods typically assume perfect state information. This limitation is especially problematic for systems that rely on neural network (NN)-based state estimators, which can be highly sensitive to noise and adversarial input perturbations. We address these problems by introducing GUARDIAN: Guaranteed Uncertainty-Aware Reachability Defense against Adversarial INterference, a safety filtering framework that provides formal safety guarantees for systems with NN-based state estimators. At runtime, GUARDIAN uses neural network verification tools to provide guaranteed bounds on the system's state estimate given possible perturbations to its observation. It then uses a modified Hamilton-Jacobi reachability formulation to construct a safety filter that adjusts the nominal control input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Smart Grid Security and Resilience · Formal Methods in Verification
