Certifiers Make Neural Networks Vulnerable to Availability Attacks
Tobias Lorenz, Marta Kwiatkowska, Mario Fritz

TL;DR
This paper reveals that certifiers for neural networks, intended to ensure robustness, can be exploited through training-time attacks to trigger fallback strategies, thereby compromising system availability.
Contribution
It introduces the first known availability attacks on neural network certifiers, demonstrating how adversaries can deliberately trigger fallbacks to reduce system integrity.
Findings
Adding 1% poisoned data can trigger fallback in models
Attacks are effective across multiple datasets and architectures
Current defenses are insufficient against these attacks
Abstract
To achieve reliable, robust, and safe AI systems, it is vital to implement fallback strategies when AI predictions cannot be trusted. Certifiers for neural networks are a reliable way to check the robustness of these predictions. They guarantee for some predictions that a certain class of manipulations or attacks could not have changed the outcome. For the remaining predictions without guarantees, the method abstains from making a prediction, and a fallback strategy needs to be invoked, which typically incurs additional costs, can require a human operator, or even fail to provide any prediction. While this is a key concept towards safe and secure AI, we show for the first time that this approach comes with its own security risks, as such fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
