TL;DR
This paper evaluates the robustness of recent neural network defenses against adaptive black-box adversarial attacks, revealing most defenses offer only marginal security improvements and emphasizing the need for comprehensive white-box and black-box testing.
Contribution
It provides the first large-scale analysis of nine recent defenses against adaptive black-box attacks, highlighting their limited effectiveness and advocating for more thorough robustness evaluations.
Findings
Most defenses offer less than 25% security improvement.
Effectiveness of attacks increases with more data available to adversaries.
White-box and black-box analyses are both essential for true security assessment.
Abstract
Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security (), as compared to undefended networks. For every defense, we also show the relationship between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
