Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence
Hanbin Hong, Xinyu Zhang, Binghui Wang, Zhongjie Ba, and Yuan Hong

TL;DR
This paper introduces a new class of black-box adversarial attacks with provable guarantees on success probability, revealing vulnerabilities of ML models even against state-of-the-art defenses through theoretical and empirical validation.
Contribution
It presents the first certifiable black-box attack framework that guarantees attack success probability without querying the target model, advancing the understanding of model vulnerabilities.
Findings
Successfully break SOTA defenses with provable confidence
Construct high-ASP adversarial example spaces
Theoretically guarantee ASP without target model queries
Abstract
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models by iteratively querying the target model or leveraging transferability from a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)
MethodsRandomized Smoothing
