Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples
Andrew C. Cullen, Shijie Liu, Paul Montague, Sarah M. Erfani, Benjamin, I.P. Rubinstein

TL;DR
This paper introduces a certification-aware attack that exploits neural network robustness certificates to generate more effective adversarial examples, revealing potential security vulnerabilities in certification methods.
Contribution
It presents a novel attack method that leverages certification information to produce more efficient adversarial examples, challenging the assumption that certifications always enhance security.
Findings
The attack produces adversarial examples 74% more often than comparable methods.
Median perturbation norm is reduced by over 10% using the attack.
Releasing certifications can paradoxically decrease model security.
Abstract
In guaranteeing the absence of adversarial examples in an instance's neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new \emph{Certification Aware Attack} exploits certifications to produce computationally efficient norm-minimising adversarial examples more often than comparable attacks, while reducing the median perturbation norm by more than . While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDisaster Response and Management
MethodsAttentive Walk-Aggregating Graph Neural Network
