Cutting through buggy adversarial example defenses: fixing 1 line of   code breaks Sabre

Nicholas Carlini

arXiv:2405.03672·cs.CR·July 2, 2024

Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre

Nicholas Carlini

PDF

Open Access

TL;DR

This paper exposes critical bugs in the evaluation of the Sabre adversarial defense, demonstrating that fixing these bugs drastically reduces its robustness to 0%, revealing the defense's vulnerabilities.

Contribution

The paper identifies and fixes evaluation bugs in Sabre, showing that its claimed robustness was due to gradient masking caused by code errors, not genuine security.

Findings

01

Fixing evaluation bugs reduces Sabre's robustness to 0%

02

Code modifications significantly impact the perceived effectiveness of the defense

03

Original evaluation flaws led to overestimation of Sabre's robustness

Abstract

Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024. We first reveal significant flaws in the evaluation that point to clear signs of gradient masking. We then show the cause of this gradient masking: a bug in the original evaluation code. By fixing a single line of code in the original repository, we reduce Sabre's robust accuracy to 0%. In response to this, the authors modify the defense and introduce a new defense component not described in the original paper. But this fix contains a second bug; modifying one more line of code reduces robust accuracy to below baseline levels. After we released the first version of our paper online, the authors introduced another change to the defense; by commenting out one line of code during attack we reduce the robust accuracy to 0% again.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Adversarial Robustness in Machine Learning