Certifiable Black-Box Attacks with Randomized Adversarial Examples:   Breaking Defenses with Provable Confidence

Hanbin Hong; Xinyu Zhang; Binghui Wang; Zhongjie Ba; and Yuan Hong

arXiv:2304.04343·cs.LG·September 9, 2024·1 cites

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Hanbin Hong, Xinyu Zhang, Binghui Wang, Zhongjie Ba, and Yuan Hong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new class of black-box adversarial attacks with provable guarantees on success probability, revealing vulnerabilities of ML models even against state-of-the-art defenses through theoretical and empirical validation.

Contribution

It presents the first certifiable black-box attack framework that guarantees attack success probability without querying the target model, advancing the understanding of model vulnerabilities.

Findings

01

Successfully break SOTA defenses with provable confidence

02

Construct high-ASP adversarial example spaces

03

Theoretically guarantee ASP without target model queries

Abstract

Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models by iteratively querying the target model or leveraging transferability from a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

datasec-lab/certifiedattack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)

MethodsRandomized Smoothing