POBA-GA: Perturbation Optimized Black-Box Adversarial Attacks via   Genetic Algorithm

Jinyin Chen; Mengmeng Su; Shijing Shen; Hui Xiong; Haibin Zheng

arXiv:1906.03181·cs.CR·June 10, 2019

POBA-GA: Perturbation Optimized Black-Box Adversarial Attacks via Genetic Algorithm

Jinyin Chen, Mengmeng Su, Shijing Shen, Hui Xiong, Haibin Zheng

PDF

TL;DR

This paper introduces POBA-GA, a genetic algorithm-based black-box adversarial attack method that achieves white-box level success rates with improved perturbation control, evaluated through comprehensive metrics and experiments.

Contribution

The paper proposes a novel genetic algorithm approach for black-box adversarial attacks that rivals white-box attack performance and enhances perturbation management.

Findings

01

POBA-GA outperforms existing black-box attacks in success rate.

02

The method achieves comparable results to white-box attacks.

03

Enhanced perturbation control improves attack stealthiness.

Abstract

Most deep learning models are easily vulnerable to adversarial attacks. Various adversarial attacks are designed to evaluate the robustness of models and develop defense model. Currently, adversarial attacks are brought up to attack their own target model with their own evaluation metrics. And most of the black-box adversarial attack algorithms cannot achieve the expected success rate compared with white-box attacks. In this paper, comprehensive evaluation metrics are brought up for different adversarial attack methods. A novel perturbation optimized black-box adversarial attack based on genetic algorithm (POBA-GA) is proposed for achieving white-box comparable attack performances. Approximate optimal adversarial examples are evolved through evolutionary operations including initialization, selection, crossover and mutation. Fitness function is specifically designed to evaluate the…

Tables7

Table 1. Table 1: Attributes of the computer vision adversarial attack methods.

Model	Black/White-box	Targeted/Non-targeted	Specific/Universal	Perturbation norm	Learning
L-BFGS [18]	White-box	Targeted	Image specific	$L_{\propto}$	One shot
FGSM [36]	White-box	Targeted	Image specific	$L_{\propto}$	One shot
BIM & ILCM [27]	White-box	Non-targeted	Image specific	$L_{\propto}$	Iterative
JSMA [22]	White-box	Targeted	Image specific	$L_{0}$	Iterative
One-pixel [24]	Black-box	Non-targeted	Image specific	$L_{0}$	Iterative
C&W [37]	White-box	Targeted	Image specific	$L_{0}, L_{2}, L_{\propto}$	Iterative
DeepFool [23]	White-box	Non-targeted	Image specific	$L_{2}, L_{\propto}$	Iterative
Uni.perturbations [38]	White-box	Non-targeted	Universal	$L_{2}, L_{\propto}$	Iterative
UPSET [54]	Black-box	Targeted	Universal	$L_{\propto}$	Iterative
ANGRI [54]	Black-box	Targeted	Image specific	$L_{\propto}$	Iterative
ZOO [50]	Black-box	Non-targeted	Image specific	$L_{2}$	Iterative
Boundary [26]	Black-box	Targeted	Image specific	$L_{2}$	Iterative
Limited [25]	Black-box	Non-targeted	Image specific	$L_{\propto}$	Iterative
MI-FGSM [52]	Both	Both	Image specific	$L_{2}, L_{\propto}$	Iterative
AutoZOOM [58]	Black-box	Both	Image specific	$L_{2}$	Iterative

Table 2. Table 2: The symbols used in the paper.

$S$	the original example
$A S_{o p t}$	the approximate optimal adversarial example of POBA-GA
$A S^{t}$	the collection of $t^{t h}$ iteration adversarial examples
	(when $t = 0$ , it represents the initial adversarial example)
$A S_{i}^{t}$	the $t^{t h}$ iteration of the $i^{t h}$ adversarial example
$A S_{i (a b)}^{t}$	the pixels of $a^{t h}$ row and $b^{t h}$ column of $A S_{i}^{t}$
$A^{t}$	a collection of perturbation of the $t^{t h}$ iteration
$A_{i}^{t}$	the $t^{t h}$ iteration of the $i^{t h}$ perturbation
$T M$	the target method
$L_{0}, L_{2}, L_{\propto}$	zero/ two/ infinite norm
$ϕ (A S_{i}^{t})$	the fitness function of $A S_{i}^{t}$
$P (A S_{i}^{t})$	the attack performance of $A S_{i}^{t}$ in $ϕ (.)$
$Z (A_{i}^{t})$	the perturbation evaluation of $A_{i}^{t}$ in $ϕ (.)$
$α$	the perturbation ratio parameter in $ϕ (.)$
$f (A S_{i}^{t})$	the selected probability of $A S_{i}^{t}$
$f r (A S_{i}^{t})$	the cumulative probability of $A S_{i}^{t}$
$p (y \| A S_{i}^{t})$	the confidence of $A S_{i}^{t}$ labeled as $y$
$y_{0}$	the true label of the original example $S$
$y_{1}, y_{2}$	the label for $S$ with first and second confidence
$y_{t a r}$	the preset label for target attack
$B, C$	the two-dimensional matrix in crossover/ mutation
$A S R$	the Attack Success Rate
$P_{c}, P_{m}$	the crossover/ mutation probability

Table 3. Table 3: Perturbations evaluation metrics.

	$A S_{1}^{t}$	$A S_{2}^{t}$	$A S_{3}^{t}$	$A S_{4}^{t}$	$A S_{5}^{t}$	$A S_{6}^{t}$
$ϕ (A S_{i}^{t})$	0.9	0.45	0.6	0.79	0.95	0.85
$f (A S_{i}^{t})$	0.20	0.10	0.13	0.17	0.21	0.19
$f r (A S_{i}^{t})$	0.20	0.30	0.43	0.60	0.81	1

Table 4. Table 4: Influence of disturbance metric on POBA-GA in VGG19

	Attack success rate	Query cout	Perturbation(per-pixel $L_{2}$ )
$L_{2}$	96%	5000	5.7e-06
$Z$	96%	5000	4.3e-06

Table 5. Table 5: Attack performance comparison.

	Black/White-box	Attack method	MNIST	CIFAR-10	ImageNet64
	Black/White-box	Attack method	MNIST	CIFAR-10	VGG19	Resnet50	Inc-V3
Perturbation (per-pixel $L_{2}$ )	White-box Adversarial attack	FGSM	6.5e-02	7.3e-05	3.4e-06	4.0e-06	2.6e-06
		DeepFool	3.2e-03	4.1e-06	2.4e-07	9.3e-08	9.1e-08
		BIM	8.2e-03	1.2e-05	8.5e-07	8.2e-07	6.4e-07
		C&W	3.1e-03	6.9e-06	5.7e-07	2.2e-07	7.6e-08
	Black-box Adversarial attack	Boundary	4.0e-03	6.4e-06	3.5e-07	2.1e-07	4.2e-07
		ZOO	4.3e-03	5.8e-04	3.9e-05	3.2e-05	2.8e-05
		AutoZOOM	6.4e-03	7.2e-04	6.2e-05	5.1e-05	5.4e-05
		POBA-GA	3.0e-03	6.8e-05	1.5e-05	1.4e-05	1.7e-05
Attack success rate	White-box Adversarial attack	FGSM	86%	89%	77%	80%	82%
		DeepFool	90%	87%	75%	79%	75%
		BIM	98%	98%	97%	96%	94%
		C&W	100%	100%	99%	100%	99%
	Black-box Adversarial attack	Boundary	78%	76%	72%	70%	70%
		ZOO	100%	100%	90%	90%	88%
		AutoZOOM	100%	100%	100%	100%	100%
		POBA-GA	100%	100%	96%	98%	95%
Query count	Black-box Adversarial attack	Boundary	12500	12500	125000	125000	125000
		ZOO	9250	4324	235272	223143	264170
		AutoZOOM	445(100)	103(86)	4647(1686)	4256(1695)	4051(1701)
		POBA-GA	423(94)	381(78)	3786(536)	3614(492)	3573(471)

Table 6. Table 6: The performance of BOPA-GA on different outcome models

	Attack success rate	Query cout(initial success)
RCC	98%	536
RSB	73%	6276

Table 7. Table 7: The defensive comparison of ensemble adversarial training and POBA-GA adversarial training.

Defense method	Attack category	algorithm	MNIST	CIFAR-10	ImageNet64
Defense method	Attack category	algorithm	MNIST	CIFAR-10	VGG19	Resnet50	Inc-V3
Ensemble adversarial training	White-box Adversarial attack	FGSM	16%	15%	20%	22%	22%
Ensemble adversarial training		DeepFool	15%	10%	10%	8%	10%
		BIM	14%	14%	18%	18%	20%
		C&W	18%	18%	22%	20%	20%
	Black-box Adversarial attack	Boundary	38%	36%	52%	50%	50%
		ZOO	64%	60%	78%	80%	78%
		AutoZOOM	60%	59%	79%	76%	78%
		POBA-GA	70%	72%	88%	86%	85%
POBA-GA adversarial training	White-box Adversarial attack	FGSM	32%	35%	50%	54%	52%
POBA-GA adversarial training		DeepFool	33%	31%	36%	34%	38%
		BIM	35%	33%	42%	44%	40%
		C&W	32%	35%	52%	50%	51%
	Black-box Adversarial attack	Boundary	37%	38%	51%	52%	54%
		ZOO	44%	42%	74%	74%	72%
		AutoZOOM	46%	40%	76%	73%	76%
		POBA-GA	26%	28%	34%	32%	32%

Equations24

\begin{array}[]{c}\phi(AS^{t}_{i})=P(AS^{t}_{i})-\frac{\alpha}{\max Z(A^{0})}Z(A^{t}_{i})\end{array}

\begin{array}[]{c}\phi(AS^{t}_{i})=P(AS^{t}_{i})-\frac{\alpha}{\max Z(A^{0})}Z(A^{t}_{i})\end{array}

\begin{array}[]{c}P(AS^{t}_{i})=\left\{\begin{array}[]{ll}p(y_{1}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})&y_{1}\neq y_{0}\\ p(y_{2}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})&y_{1}=y_{0}\end{array}\right.\end{array}

\begin{array}[]{c}P(AS^{t}_{i})=\left\{\begin{array}[]{ll}p(y_{1}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})&y_{1}\neq y_{0}\\ p(y_{2}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})&y_{1}=y_{0}\end{array}\right.\end{array}

\begin{array}[]{c}\phi(AS^{t}_{i})=\left\{\begin{array}[]{ll}p(y_{1}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})-\frac{\alpha}{\max Z(A^{t_{0}})}Z(A^{t}_{i})&y_{1}\neq y_{0}\\ p(y_{2}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})&y_{1}=y_{0}\end{array}\right.\end{array}

\begin{array}[]{c}\phi(AS^{t}_{i})=\left\{\begin{array}[]{ll}p(y_{1}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})-\frac{\alpha}{\max Z(A^{t_{0}})}Z(A^{t}_{i})&y_{1}\neq y_{0}\\ p(y_{2}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})&y_{1}=y_{0}\end{array}\right.\end{array}

\begin{array}[]{c}f(AS^{t}_{i})=\frac{\phi(AS^{t}_{i})}{\sum_{j=1}^{n}\phi(AS^{t}_{j})}\end{array}

\begin{array}[]{c}f(AS^{t}_{i})=\frac{\phi(AS^{t}_{i})}{\sum_{j=1}^{n}\phi(AS^{t}_{j})}\end{array}

\begin{array}[]{c}fr(AS^{k}_{i})={\sum_{j=1}^{i}f(AS^{t}_{j})}\end{array}

\begin{array}[]{c}fr(AS^{k}_{i})={\sum_{j=1}^{i}f(AS^{t}_{j})}\end{array}

\begin{array}[]{c}A^{1}_{cross}=\left\{\begin{array}[]{ll}A^{t}_{i}*B+A^{t}_{j}*(1-B)&rand(0,1)<P_{c}\\ A^{t}_{i}&otherwise\\ \end{array}\right.\end{array}

\begin{array}[]{c}A^{1}_{cross}=\left\{\begin{array}[]{ll}A^{t}_{i}*B+A^{t}_{j}*(1-B)&rand(0,1)<P_{c}\\ A^{t}_{i}&otherwise\\ \end{array}\right.\end{array}

\begin{array}[]{c}A^{2}_{cross}=\left\{\begin{array}[]{ll}A^{t}_{i}*(1-B)+A^{t}_{j}*B&rand(0,1)<P_{c}\\ A^{t}_{j}&otherwise\\ \end{array}\right.\end{array}

\begin{array}[]{c}A^{2}_{cross}=\left\{\begin{array}[]{ll}A^{t}_{i}*(1-B)+A^{t}_{j}*B&rand(0,1)<P_{c}\\ A^{t}_{j}&otherwise\\ \end{array}\right.\end{array}

\begin{array}[]{c}A^{t+1}_{i^{\prime}+q-1}=\left\{\begin{array}[]{ll}A^{q}_{cross}*C&rand(0,1)<P_{m}\\ A^{q}_{cross}&otherwise\\ \end{array}\right.\end{array}

\begin{array}[]{c}A^{t+1}_{i^{\prime}+q-1}=\left\{\begin{array}[]{ll}A^{q}_{cross}*C&rand(0,1)<P_{m}\\ A^{q}_{cross}&otherwise\\ \end{array}\right.\end{array}

\begin{array}[]{c}Z(A^{t}_{i})=\sum_{a=1}^{m_{a}}\sum_{b=1}^{m_{b}}(\frac{1}{1+e^{-|AS^{t}_{i(ab)}|*pm_{1}+pm_{2}}}-\frac{1}{1+e^{pm_{2}}})\end{array}

\begin{array}[]{c}Z(A^{t}_{i})=\sum_{a=1}^{m_{a}}\sum_{b=1}^{m_{b}}(\frac{1}{1+e^{-|AS^{t}_{i(ab)}|*pm_{1}+pm_{2}}}-\frac{1}{1+e^{pm_{2}}})\end{array}

\begin{array}[]{c}ASR=\left\{\begin{array}[]{ll}\frac{sumNum({AS_{opt}}|y_{1}=y_{tar})}{sumNum({AS_{opt}})}&$targeted attack$\\ \frac{sumNum({AS_{opt}}|y_{1}\neq{y_{0}})}{sumNum({AS_{opt}})}&$non-targeted attack$\end{array}\right.\end{array}

\begin{array}[]{c}ASR=\left\{\begin{array}[]{ll}\frac{sumNum({AS_{opt}}|y_{1}=y_{tar})}{sumNum({AS_{opt}})}&$targeted attack$\\ \frac{sumNum({AS_{opt}}|y_{1}\neq{y_{0}})}{sumNum({AS_{opt}})}&$non-targeted attack$\end{array}\right.\end{array}

\begin{array}[]{c}\widehat{P}(AS^{t}_{i})=\frac{1}{N^{\prime}}\sum_{i=1}^{N^{\prime}}R(AS^{t}_{i}+\delta)\end{array}

\begin{array}[]{c}\widehat{P}(AS^{t}_{i})=\frac{1}{N^{\prime}}\sum_{i=1}^{N^{\prime}}R(AS^{t}_{i}+\delta)\end{array}

\begin{array}[]{c}LR=BLR*e^{-\frac{i*in(0.1/MLR)}{STEPS}}\end{array}

\begin{array}[]{c}LR=BLR*e^{-\frac{i*in(0.1/MLR)}{STEPS}}\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

POBA-GA: Perturbation Optimized Black-Box Adversarial Attacks via Genetic Algorithm

Jinyin Chen

Mengmeng Su

Shijing Shen

Hui Xiong

Haibin Zheng

Zhejiang University of Technology, Hangzhou 310023, China

Abstract

Most deep learning models are easily vulnerable to adversarial attacks. Various adversarial attacks are designed to evaluate the robustness of models and develop defense model. Currently, adversarial attacks are brought up to attack their own target model with their own evaluation metrics. And most of the black-box adversarial attack algorithms cannot achieve the expected success rate compared with white-box attacks. In this paper,comprehensive evaluation metrics are brought up for different adversarial attack methods. A novel perturbation optimized black-box adversarial attack based on genetic algorithm (POBA-GA) is proposed for achieving white-box comparable attack performances. Approximate optimal adversarial examples are evolved through evolutionary operations including initialization, selection, crossover and mutation. Fitness function is specifically designed to evaluate the example individual in both aspects of attack ability and perturbation control. Population diversity strategy is brought up in evolutionary process to promise the approximate optimal perturbations obtained. Comprehensive experiments are carried out to testify POBA-GA’s performances. Both simulation and application results prove that our method is better than current state-of-art black-box attack methods in aspects of attack capability and perturbation control.

keywords:

Deep learning , adversarial attack , perturbation optimization , genetic algorithm.

MSC:

[2010] 00-01, 99-00

††journal: Journal of LaTeX Templates

1 Introduction

Deep learning is the core of current machine learning and artificial intelligence [1]. Since it has powerful learning, feature extraction and modeling capabilities, it has been widely applied to challenging areas, such as social networks [2], medical image analysis [3, 4, 5] and selective classification [6, 7]. And in the area of computer vision, deep learning has become the main force for various applications such as self-driving cars [8], image processing [9, 10], target-driven visual navigation [11]and scene recognition [12].

The latest research shows that although deep learning can extract complete image features and forecast or classify it perfectly, it can be easily fooled by adversarial examples into erroneous prediction outputs by adding small perturbations on the original image [13, 14, 15, 16, 17]. The adversarial attack was first proposed by Szegedy et al. [18] and attracted more attentions, becoming a new hot topic. As long as deep models are threatened by adversarial attack, lots of deep model based applications are unable to extend.For instance, payment based on facial recognition cannot be trusted since an adversarial glass can help one imitate another person easily [19], and auto-drive based on image recognition is quite dangerous if the road sign is a carefully designed adversarial example [20]. In general, we can assess the robustness of the classifier by simulating the attacker’s efforts to evade the classifier [21], so it is also possible to increase the robustness of the model by defending adversarial attacks. Lots of adversarial attack methods are brought up for understanding the attack and improving model’s defensibility, such as Jacobian-based Saliency Map Attack (JSMA) [22], DeepFool [23], One-pixel Attack [24], Limited Queries and Information Attack [25].

Adversarial attacks can be roughly divided into three categories: gradient-based, score-based and transfer-based attacks [26]. Gradient-based and score-based attacks are often denoted as white-box and oracle attacks respectively. Most existing gradient-based white-box attacks rely on detailed model information, such as the Basic Iterative Model (BIM) [27], Houdini [28], DeepFool [23], which are taking advantage of gradient loss. There are also black-box attacks, which use transfer across models [29] or require access to all training data sets [30]. Since most real-world systems do not publish the network’s weight, architecture or training data sets, so white-box attacks and equivalent model attacks are not easy to implement in practice. There is still a flaw that if the attacker is capable of black-box attack without the internal configuration of the target model. That’s the reason we develop a completely internal model information independent adversarial attack. It can not only fill the gap of evolutionary based adversarial attacks with minimal perturbation, but also help evaluating and improving the defensibility of current state-of-art deep models.

Excellent adversarial examples generally have the following two characteristics. First, it is very similar to the original image. There are slight perturbations barely distinguishable by the naked eye. Second, the adversarial example can lead the target model to a high confidence misclassification. Black-box attacks are conducted without any internal model information (structure and parameter), but based on the information such as most probable class label or confidence. The target of black-box attack is to reduce the confidence of true label with limited perturbation. Therefore, in most cases black-box attack can be modeled as an optimization problem. Genetic algorithm is widely applied to various applications as a typical optimization tool, such as energy optimization [31], distribution network optimization [32], ontology alignments optimization [33] and web crawler [34], and all of them achieve good optimization performance. In this paper, a novel adversarial perturbation optimization attack based on genetic algorithm is proposed to implement black-box attack. Fitness function is constructed on the basis of classification confidence and perturbation size, and genetic operations are designed to promise approximate optimal adversarial example.

The current adversarial attacks generally use $L_{0}$ , $L_{2}$ and $L_{\propto}$ as evaluation metrics for the perturbations. However, most attack algorithms use different perturbation evaluation metrics according to the characteristics of the algorithm [35]. For example, L-BFGS [18] and FGSM [36] use $L_{\propto}$ , JSMA [22] and One-pixel [24] use $L_{0}$ , DeepFool [23] uses $L_{2}$ . Therefore, we can only evaluate the quality of the algorithm from different perspectives, and cannot judge which perturbation and algorithm are better. Based on the sensitivity of the human visual system to perturbation, a new perturbation assessment method is put forward which can comprehensively evaluate the perturbation, making the perturbation more similar to the actual sensitivity.

The main contribution of our work can be concludes as:

Perturbation optimization. POBA-GA is a novel perturbation optimization method to generate black-box adversarial examples, which is capable of high successful attack rate against different deep learning models with controllable perturbations.

2.

White-box comparable black-box attack. POBA-GA can optimize the perturbations based on the confidence of the black-box output. In most cases, it can achieve high attack success rate comparable with white-box attack methods.

3.

New perturbations evaluation metric. A novel perturbation evaluation metrics is put forward. It comprehensively evaluates the perturbation and maps its size to different dimensions, which makes the evaluation result more realistic reflection of the perturbation.

4.

Improve defense capability through adversarial training. Adversarial examples generated by POBA-GA are adopted to train deep model to improve its defense capacity. Experiments prove that adversarial training based on examples from POBA-GA could defend model better than from other attack methods.

The rest of paper is organized as follows. The related works are discussed in Sec. 2. The main methods and strategies are introduced in Sec. 3. Experiments and conclusions are shown in Sec. 5 and Sec. 6.

2 Related Works

In this section, we introduce classic adversarial attack methods, genetic algorithms and perturbation evaluation metrics.

2.1 Attacks methods

Since Szegedy et al. [18] proposed the concept of adversarial attack against deep model, a large number of adversarial attacks are put forward [36, 37, 23, 38, 39]. Some researchers have launched attacks in applications, such as speech recognition systems [40], malware detectors [41, 21], and face recognition systems [42]. For example, Mengying Sun et al. [43] use adversarial attacks against deep predictive models to identify susceptible locations in medical records. Tegjyot Singh Sethi et al. [44] present an adversary’s view point of a classification based system. Generally, attacks are classified into white-box attacks and black-box attacks based on whether they know the internal structure of the target model. It should be noted that, because there are too many adversarial attack literatures, this paper mainly introduces some algorithms of computer vision.

2.1.1 White-box Adversarial Attack Models

A white-box attack is a method of attacking a target model while it knows its internal structure. White-box attacks are the most common attack method, which can also be used to attack the equivalent model to achieve black-box attacks. At present, researchers have proposed a large number of white-box attacks [36, 27, 22, 23, 37]. For example, Bose et al. proposed adversarial attacks on face detectors using neural net based constrained optimization [45]. Yu et al. proposed a fast adversarial attack example generation framework based on adversarial saliency prediction [46]. Chen et al. proposed robust physical adversarial attack on faster R-CNN object detector [47]. Ramanathan et al. proposed adversarial attacks on computer vision algorithms using natural perturbations [14].

The perturbation optimization algorithm in this paper takes the adversarial examples generated by the white-box attack as a partial initial solution and realizes the perturbation optimization through the genetic algorithm. To ensure the diversity of the initial solution, we chose seven different attack methods to generate. The reasons for choosing this methods are described in detail in Sec. 3. The attack method is described in detail below.

Fast Gradient Sign Model (FGSM) [36] FGSM is one of the simplest and most widely used non-target counter attacks. FGSM uses backward propagation gradient from the target DNN to generate adversarial examples. Perturbation is evaluated by $\rho=\epsilon sign(\bigtriangledown J(\theta,I_{c},l))$ [1], where $\bigtriangledown J$ denotes the gradient of the original image $I_{c}$ around the model parameters $\theta$ . $sign(.)$ denotes the sign function, and $\epsilon$ is a small scalar value that limits the perturbation evaluation.

Basic Iterative Model (BIM) [27] BIM, equivalent to Projected Gradient Descent, is a standard convex optimization method. It is an extension of the single-step method, which takes multiple small step iterations while adjusting the direction after each step. After a sufficient number of iterations, BIM can successfully generate an adversarial example classified into the target label.

Jacobian-based Saliency Map Attack (JSMA) [22] JSMA describes the input-output relationship of the target DNN by constructing a Jacobian-based saliency map. It iteratively modifies the most important pixels based on saliency mapping during iteration to fool the network. At each iteration, JSMA recalculates the saliency map and uses the DNN derivative of the input image as a modify index of the adversarial attack. This greedy search process is repeated until the number of changing pixels reaches the threshold or the deception is successful.

DeepFool [23] DeepFool is a simple but very effective non-targeted attack. In each iteration, it calculates the minimum distance $d(y_{1},y_{0})$ required for each label $y_{1}\neq y_{0}$ to reach the class boundary by approximating the model label with a linear label. $y_{1}$ represents the label of the adversarial example classified by deep model, and $y_{0}$ represents the true label of the adversarial example. Then make the appropriate steps in the direction of the nearest class. The image perturbation for each iteration are accumulated and the final perturbation is calculated once the output criteria changes.

Carlini and Wagner Attacks (C $\&$ W) [37] Carlini and Wagner Attacks is one of the strongest attacks. Its attack essence is a kind of refined iterative gradient attack that uses the Adam optimizer. It uses the internal configuration of the target DNN to guide the attack, and uses $L_{2}$ specification to quantify the difference between the hostile and original image.

Gaussian Blur [48] Gaussian Blur is a kind of linear smoothing filter, which is suitable for eliminating Gaussian noise and is widely used in the noise reduction process of image processing. Generally speaking, Gaussian filtering is the process of weighted averaging of the entire image. The value of each pixel is obtained by weighted averaging of its own and other pixel values in the neighborhood. However, the Gaussian filter also causes the image to lose certain eigenvalues, making the CNN classification error.

Salt and Pepper Noise [49] Salt and pepper noise, also known as impulsive noise, randomly changes some pixel values. Salt and pepper noise appears on a binary image is to make some pixels white or black, black is pepper, and white is salt. And these salt and pepper noises have a certain probability of misclassifying the CNN label.

2.1.2 Black-box Adversarial Attack Models

The black-box can only access the input and output of the target model, but cannot access the internal configuration of the target model. In case of image label trained by CNN, we take the image as input and produces a confidence score for each label as an output [50]. In practical applications, most of the target models are black-box models, so black-box attacks have important research significance and have been studied by many researchers [51, 52, 53]. For example, Milton et al. [53] proposed the evaluation of momentum diverse input iterative fast gradient sign method (M-DI2-FGSM) to attack the black-box facial recognition system. Dong et al. [52] propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. Brendel et al. [26] introduce the Boundary Attack, open new avenues to study the robustness of machine learning models and raise new questions regarding the safety of deployed machine learning systems. Andrew et al. [25] proposed the black-box adversarial attacks with limited information query. In the current black-box attack, ZOO and Boundary are the state-of-art attack methods, which aim to improve the attack rate.

2.2 Genetic algorithm

The existing attack method can be regarded as the solution of the optimization problem to some extent. For example, ZOO uses a zeroth order method to optimize black-box attack [50]. UPSET and ANGRI use the so-called UPSET network to optimize black-box attack [54]. Houdini optimized for mean per-pixel or per-class accuracy instead of mIoU for better experimental results [28]. Genetic algorithm has been widely used in various optimization problems and have achieved good results, for example, structures locations optimization [55], distribution network optimization [56] and relay related optimization [57]. Therefore, we apply genetic algorithms to perturbation optimization to generate high-quality adversarial examples. Solutions are initialized as population. Fitness of each individual in population is calculated. Selection, crossover and mutation operators are carried out by certain probability to update whole population until termination condition is met. Approximate optimal solution will be found for the problem.

2.3 Perturbation evaluation metrics

Current adversarial attacks use $L_{0}$ , $L_{2}$ and $L_{\propto}$ to evaluate the size of the perturbation. $L_{p}$ distance metric is used as a measure of similarity [39], and it is used to evaluate the size of perturbation. The $L_{0}$ distance measurement indicates the number of pixels changed, and the $L_{2}$ represents the Euclidean distance between the two examples, and the $L_{\propto}$ represents the maximum perturbation between two pixels. Table 1 list the perturbation metrics and other attributes of the current adversarial attack methods in computer vision. Due to the recent research on adversarial attacks is very popular, we are unable to list all the literature, so the table only lists the popular ones or representative of the popular direction of computer vision adversarial attack methods.

3 Model

3.1 Problem definition

For a given example $S$ , Deep Neural Model (DNN) is applied to classify $S$ with an output label. Attack Method $AM$ is adopted to generate perturbation $A$ to add to $S$ . Directly or optimally generate an adversarial example $AS$ to attack target model $TM$ , so the target model outputs an error label. In the field of image recognition, $S$ represents the original image. The adversarial example $AS$ is generated and used to attack method $AM$ , which is added by negligible perturbation $A$ .

3.1.1 DEFINITION 1 (DNN based image label)

Deep Neural Network (DNN) is trained by a large number of labeled images. For a given image $S$ , DNN can output label $y_{1}$ , represented as $TM(\Theta,S)=y_{1}$ , where $\Theta$ represents parameters, $y_{1}$ is the output label of the highest confidence.

3.1.2 DEFINITION 2 (Adversarial attack)

Given a DNN for $S$ , whose response output is $TM(\Theta,S)=y_{0}$ , Attack Method $AM$ generates an adversarial image $AS$ to make $TM(\Theta,AS)=y_{1}$ , and $y_{0}\neq y_{1}$ , where $S$ and $AS$ are almost indistinguishable.

Adversarial attack is triggered by adversarial examples. Most adversarial examples are generated by adding perturbations into the original image. The perturbation quality will decide the attack capacity. An effective black-attack can be considered as generating a high quality perturbation without knowing the internal structure of the $TM$ , causing the $TM$ to output an error class , which is illustrated in Figure 1. And how to generated a black-box adversarial example can be assumed as an optimization problem.

3.2 Framework

We propose a novel Perturbation Optimization black-box Attack based on Genetic Algorithm (POBA-GA), which takes a variety of different random perturbations as the initial examples, $\phi(.)$ is the fitness function to optimize the perturbation to obtain approximate optimal adversarial example $AS_{opt}$ . The block diagram of POBA-GA is illustrated in Figure 2. And the symbols used in the paper are listed in Table 2.

Figure 2 demonstrates how a high quality adversarial example is generated through genetic algorithm. First, we generate different perturbations based on different noise point pixel thresholds, number of noise points, and noise point size. Then, add the perturbation into the original example $S$ to generate the responding initial adversarial example $AS^{t=0}$ , where $t=0$ represents the first generation of population. Second, we define a fitness function $\phi(AS^{t}_{i})$ to evaluate the $t^{th}$ iteration example $AS^{t}_{i}\in AS^{t}$ , $i\in[0,n]$ . Third, we judge whether the termination condition has been reached. Fourth, we apply typical operators in genetic algorithm including selection, crossover and mutation to evolve new generation for perturbation optimization.

3.3 Initialization

Initialization is responsible for initial solution generation at the beginning. The quality of the initial solution directly affects the iterations. If the initial solution is similar to the approximate optimal solution, the algorithm will convergence quickly. Otherwise, the algorithm need more iterations to the convergence to approximate global optimal. In addition to consider the quality of each initial solution, the diversity of the initial solution is also crucial. It has been proved that diversity of initial population of genetic algorithm could promise approximate global optimal [59].

For a given example $S$ , the optimization purpose is to evolve an approximate optimal adversarial example $AS_{opt}$ . In random perturbation initialization, we use a search distribution of random Gaussian noise around the original image $S$ [25], which is described as $AS^{t=0}=S+\delta$ , where $\delta\sim\aleph(\mu,\sigma^{2})$ and $\mu$ represent mathematical expectation, $\sigma^{2}$ represent variance. . In order to increase the diversity of the initial perturbation, this paper generates different types of initial perturbation based on different variance, number of noise points, and noise point size.

3.4 Fitness function

Fitness function is defined to evaluate the quality of examples in GA. The proper fitness function can directly affect the convergence speed of genetic algorithm and whether it can find the optimal solution.

Excellent adversarial examples generally have the following two characteristics. First, it is very similar to the original image. There are only slight perturbation that are barely distinguishable by the naked eye. Second, the adversarial example may be misclassified by the target model with a highly confidence. Therefore, the fitness function designed should be related to the confidence and perturbation size, Eq. 1.

[TABLE]

where $\phi(AS^{t}_{i})$ represent the fitness function for example $AS^{t}_{i}$ , $P(AS^{t}_{i})$ represent the attack performance of example $AS^{t}_{i}$ calculated by Eq. 2, $Z(A^{t}_{i})$ represents the size of adversarial examples perturbation can calculated by Eq. 9, and $A^{t}_{i}+S=AS^{t}_{i}$ . $Z(A^{t}_{i})$ is a novel perturbation metric proposed in this paper, which will be explained and compared in Sec. 4. $\alpha$ represents the proportional coefficient and is used to adjust the proportion of attack performance and perturbation size. For example, when $\alpha=0$ , the fitness function only considers attack performance $P(AS^{t}_{i})$ . When $\alpha$ is larger, the optimization process pays more attention to the size of the perturbation. The optimized adversarial example may have only general attack performance, but the perturbation is very small.

[TABLE]

where $y_{0}$ represents the true label of the adversarial image, and $y_{1}$ , $y_{2}$ represent the label with highest and second highest confidence of the $TM$ output for $AS^{t}_{i}$ , respectively. And $y_{1}$ also means the output label. $p(y|AS^{t}_{i})$ represents the confidence that the adversarial example $AS^{t}_{i}$ is labeled as $y$ by the $TM$ . When the output label is different from $y_{0}$ of $AS^{t}_{i}$ , the attack is successful. The $TM$ outputs the wrong label, and the attack performance is the confidence difference between the label $y_{1}$ and the true label $y_{0}$ . The greater the attack capability, the stronger the attack capability. Otherwise, the output label is the same as the true label, which means the attack failed. Then we calculate the confidence difference between the highest label and the second highest label. The larger the difference, the more difficult it is to succeed.

In order to achieve the attack faster and reduce the initial number of queries to the DNN model, we will not consider the perturbation before the attack succeeds, i.e. let $\alpha=0$ . Only when the attack is successful, we will consider both attack performance $P(AS^{t}_{i})$ and perturbation $Z(A^{t}_{i})$ . The updated fitness function is as follows.

[TABLE]

where $t_{0}$ is the number of iterations when the initial attack succeeds, and $\frac{\alpha}{\max Z(A^{t_{0}})}$ is used to control the perturbation to a certain range.

3.5 Evolutionary operations

3.5.1 Selection operator

Selection operator is adopted to choose two parent examples to produce two children by crossover and mutation operators. Generally, quality examples are more likely to be selected. This paper uses roulette wheel selection [60]. For a given example $AS^{t}_{i}$ , its selection probability can be calculated according to Eq. 4. And the interval $(fr(AS^{t}_{i-1}),fr(AS^{t}_{i})]$ for each model is calculated according to Eq. 5.

[TABLE]

For better understanding, we give a simple illustration of Roulette wheel selection. Let $n=6$ , table 3 shows $\phi(AS^{t}_{i})$ , $f(AS^{t}_{i})$ and $fr(AS^{t}_{i})$ of the adversarial examples. The table shows that the adversarial example $AS^{t}_{5}$ has the largest fitness value, which means it is the best example among all. The probability of $AS^{t}_{5}$ is selected as a parent should also be the largest, i.e. $0.21$ . The probability sum of all examples is $1$ .

3.5.2 Crossover operator

Crossover operator is executed to generate new examples from selected examples. This paper takes a uniform crossover, in which the genes at each locus of two paired individuals are exchanged with the same crossover probability, thus forming two new individuals. Figure 3(a) shows the schematic of the crossover and variation of the example. To make the crossover process clearer, we split the step into a 2-step display. Unlike traditional chromosome crossover, our perturbation crossover can be considered as two-dimensional matrices crossover. $A^{t}_{i}$ and $A^{t}_{j}$ are the two parent perturbations selected, represented in Figure 3(a) by blue and yellow matrices, respectively. Then, they are divided into two parts according to the same cross matrix. Where $B$ is a two-dimensional matrix, and the RGB pixel value of the position is exchanged when crossing. Finally, cross perturbation to generate new perturbations $A^{1}_{cross}$ and $A^{2}_{cross}$ to pass to the next step. This process can be represented by equations Eq. 6 and Eq. 7. You can find a specific example in the figure 3(b).

[TABLE]

where $B$ is a matrix of the same size of the image, each element of $B$ is random number of [math] or $1$ . $rand(0,1)$ means randomly generating a number between 0 and 1. $P_{c}$ represents the probabilities of crossover. Figure 11 in the appendix shows the effect of $P_{c}$ on experimental results. The experimental results show that the larger the $P_{c}$ is, the faster the fitness function converges. This is mainly because we use the parent-child hybrid method to generation updates and our mutation probability is very low. When $P_{c}<1$ , it is very likely to generate the same child example as the father, that is, generate duplicate examples, which increases the number of unnecessary queries and increases attack time and cost. Therefore, in order to reduce the cost of attack, this paper makes $P_{c}=1$ .

3.5.3 Mutation operator

Mutation operator indicates that some examples will be altered by a certain probability during the breeding process. In POBA-GA, we adopt multi-point mutation. According to mutation probability $Pm$ , randomly select several pixels of $AS^{q}_{cross}$ , where $q=\{1,2\}$ . The effect of variation is shown in Figure 2, and the specific process is shown in Figure 3. The process is defined as Eq. 8.

[TABLE]

where $C$ is a matrix of the same size of the image. Except for a small number of elements in C, which are between 0 and 2, the remaining elements are all 1. $P_{m}$ represents the probability of mutation. Through experiments we make $P_{m}=0.001$ for the ImageNet data set and $P_{m}=0.003$ for the MNIST and CIFAR-10 data set.

Algorithm 1 shows the pseudo code of POBA-GA, and Algorithm 2 shows the pseudo code of the function in POBA-GA.

3.6 Generation update

Generation is updated by father-son mixed selection. The population size $N$ perturbation with the highest fitness function in $A^{t}_{i}$ and $A^{t+1}_{i}$ are updated to $A^{t+1}_{i}$ . This method is mainly used to prevent the optimal individual of the current group from being lost in the next generation, which causes the genetic algorithm cannot converge to the global optimal solution. And the termination condition of this paper is to achieve the number of cycles or the fitness function is greater than a certain value $\gamma$ .

4 Evaluation Metrics

In general, the researchers use $L_{0}$ , $L_{2}$ and $L_{\propto}$ to calculate the perturbation size. However, this paper finds that there are problems in these three metric, so this paper proposes a perturbation evaluation metric improved from Sigmoid. When evaluating perturbation by the naked eye, our metric is more in line with visual assessment of perturbation. When evaluating perturbation by machine, our metric can speed up the reduction of perturbation in adversarial attack, and effectively reduce the difference between adversarial examples and original examples.

4.1 Naked-eye evaluation

Current adversarial attack methods often use p-norm $L_{p}$ to limit perturbations[1] of adversarial examples. For example, the zero-norm $L_{0}$ distance measures the number of pixels that have changed, the second-norm $L_{2}$ distance measures the Euclidean distance between two images, and the infinite-norm $L_{\propto}$ distance measures the maximum perturbation between two pixels.

However, no matter what kind of metric we use, we can not express the relationship between the magnitude of the nine perturbations in Figure 4. $L_{0}$ and $L_{\propto}$ can’t compare the perturbation size of these 9 pictures, and $L_{2}$ thinks that the difference between the $9^{th}$ picture and the $7^{th}$ picture is equal to the difference between the $1^{st}$ picture and the $5^{th}$ picture. However, it is difficult to distinguish the difference between the last three pictures or the difference between the first two pictures, and the difference between the $5^{th}$ and $6^{th}$ picture can clearly distinguished.

The calculation of the perturbation size is more similar to the sigmoid curve distribution. Inspired by this function, we propose a new perturbation calculation indicator $Z(A^{t}_{i})$ . For a given adversarial example $A^{t}_{i}$ , the perturbation is by Eq. 9.

[TABLE]

where $m_{a}$ and $m_{b}$ are the height and width of the image. $AS^{t}_{i(ab)}$ is the perturbation pixels in row $a$ and column $b$ . $pm_{1}$ and $pm_{2}$ are the parameters to adjust the perturbation pixel mapping rule. When we evaluate with the naked eye, we set $pm_{1}=10$ , $pm_{2}=5.8$ .

Compare new perturbation evaluation metric with the $L_{2}$ , Figure 5 is a comparison of perturbations evaluation metrics for a single perturbed pixel. Among them, the red line is the calculation result of the perturbations evaluation of our metric, and the blue line is the perturbations evaluation of the $L_{2}$ . Combined Figure 4 and Figure 5, we can find that $Z(A^{t}_{i})$ is more in line with the sensitivity of the human eye to perturbation. The metric proposed in this paper can effectively compensate for the vulnerabilities of $L_{0}$ , $L_{2}$ and $L_{\propto}$ .

4.2 Machine evaluation

In the last section we have shown that $Z(A^{t}_{i})$ is more in line with the sensitivity of the human eye to perturbation. And in this section we mainly prove that $Z(A^{t}_{i})$ can better distinguish small perturbation, making the perturbation of adversarial examples generated by adversarial attacks smaller. In general, the maximum perturbation is less than $0.43(110/255)$ , this paper set $pm_{1}=15$ , $pm_{2}=3$ .

In order to compare the impact of using the $Z(A^{t}_{i})$ metric and the $L_{2}$ metric on the POBA-GA attack. We set the fitness function to $\phi(AS^{t}_{i})=P(AS^{t}_{i})-\frac{\alpha}{\max Z(A^{0})}Z(A^{t}_{i})$ and $\phi(AS^{t}_{i})=P(AS^{t}_{i})-\frac{\alpha}{\max||A^{0}_{i}||_{2}}||A^{t}_{i}||_{2}$ respectively. The experimental settings refer to section 5, and iterating 100 generations. The experimental results are shown in Table 4. It should be noted that in order to prove the validity of our metric, we use $L_{2}$ to calculate the perturbation of adversarial examples. Table 4 shows that our metric can speed up the reduction of perturbation in adversarial attack, and effectively reduce the difference between adversarial examples and original examples. This is mainly because $Z(A^{t}_{i})$ expands the difference between perturbation and makes the perturbation optimize faster in the direction of smaller perturbation. From the table we can find that their attack success rate is 96%, because we do not consider the perturbation before the attack is successful, so the metric does not affect the attack success rate.

5 Experiments

In this section, we separately experiment on the parameter sensitivity, attack performance, universality of the attack algorithm, its impact on the robustness of the model and the practical application of the algorithm. In 5.1, we describe the platform, database, DNN models and attack implementation, making it easy for readers to reproduce the experiment. In 5.2, we analyzed the effect of parameter $\alpha$ on the experiment, so that people can quickly select different parameters $\alpha$ according to their needs. In 5.3, we compared the POBA-GA with high-quality white-box and black-box attacks, including the state of the art white-box attack C&W and black-box attack AutoZOOM [58]. In 5.4, we study the experiment when the black-box model only return just a single binary outcome and target attack. And we want to show that our POBA-GA can achieve good attack effect and universality in such situation. In 5.5, we compare the defense performance of POBA-GA adversarial training and ensemble adversarial training, hoping to prove that the defense capability of the POBA-GA adversarial training is stronger than the ensemble adversarial training. In 5.6, we attack the face recognition system, hoping to prove that OPBA-GA not only has good experimental results in the experimental data set, but also can get good attack results in the real world and other data sets.

5.1 Experiment setup

Platform: The platform for all experiments is i7-7700K 4.20GHzx8 (CPU), TITAN Xp 12GiBx2 (GPU), 16GBx4 memory (DDR4), Ubuntu 16.04 (OS), Python 3.5, Tensorflow-gpu-1.3, Tflearn-0.3.2111Tflearn can be download at https://github.com/tflearn/tflearn.

Database: We use three publicly available image databases, namely, MNIST222MNIST can be download at http://yann.lecun.com/exdb/mnist/, CIFAR-10333CIFAR-10 can be download at * https://www.cs.toronto.edu/ kriz/cifar.html*, ImageNet64444ImageNet64 can be download at http://image-net.org/download-images. MNIST is a handwritten digitally recognized data set containing 70,000 grayscale images of 2828 size, divided into 10 classes. The CIFAR-10 data set is also a small image data set, contains 60,000 color images of size 3232, divided into 10 classes. The Imagenet data set is a large image data set, contains about 15 million images and 22,000 classes. Before the experiment, we reshaped it to a size of 224*224.

Baseline methods: We compare POBA-GA with eight different baselines, including four white box attacks and four black box attacks,including Fast Gradient Sign Model (FGSM) [36], DeepFool [23], Basic Iterative Model (BIM) [27], Carlini and Wagner Attacks (C&W) [37], Boundary [26], ZOO [50], AutoZOOM [58]. These baselines are classic and efficient attack methods, including the state of the art white-box attack C&W and black-box attack AutoZOOM.

DNN Models: For MNIST and CIFAR-10, we use the same DNN model as in the C&W attack [37]555https://github.com/carlini/nn_robust_attacks*(commit 1193c79), which is also used by ZOO and Boundary [26]. On ImageNet we use the same pretrained networks VGG19 [61], Resnet50 [62] and Inception-V3 (Inc-V3) [63] provided by Keras 666https://github.com/fchollet/keras*(commit 1b5d54) as Boundary.

Attack implementation: In order to make the experimental results of the comparative experiment accurate, the parameters of the comparison algorithm are taken directly from the corresponding literature. In this paper, POBA-GA performs 100 iterations on MINIST and CIFAR-10, generating 20 descendants per iteration, the variance is taken from $[5,10,15,20,25]$ , the number of noise points is taken from $[50,100,150,200,250]$ . And POBA-GA performs 400 iterations on ImageNet, and generating 50 descendants per iteration, the variance is taken from $[5,10,15,20,25]$ , the number of noise points is taken from $[5000,7500,10000,12500,15000]$ .

All experimental results in this paper are average values. For MNIST and CIFAR-10, we evaluated 1000 randomly examples from the validation set, for ImageNet we used 250 images.

Evaluation metric: This paper uses the attack success rate (ASR) [64], perturbation (per-pixel $L_{2}$ ) and query number to evaluate the performance of the experiment. The ASR is used to evaluate the attack probability of the attack method against the target model, which is calculated by Eq. 10.

[TABLE]

where $sumNum(.)$ is the number of examples.

$L_{2}$ norm (i.e. Euclidean distance) is used to quantify the difference between the adversarial and the original examples. Query number refers to how many examples the target model needs to predict before the attack method reaches the stop condition. This metric is especially important when the target model has a limit on the number of queries.

5.2 Influence of perturbation ratio parameter $\alpha$

The perturbation weight $\alpha$ is used to adjust the balance between perturbation $Z(A^{t}_{i})$ and attack performance $P(AS^{t}_{i})$ . When $\alpha$ is larger, the example is more similar to the original one, but the success rate will be lower. When $\alpha$ is smaller, the perturbation of the adversarial example is larger, the attack success rate is higher. The settings of parameter $\alpha$ will change as people’s attack requirements change, so we analyzed the effect of parameter $\alpha$ on the experiment so that people can quickly select different parameters $\alpha$ according to their needs. In addition, we also pointed out in the article that if people have clear expectations, they can also use automatic methods to adjust the parameter $\alpha$ , such as irace777Irace https://cran.r-project.org/web/packages/irace/index.html

Figure 6 shows the influence of $\alpha$ on the ImageNet64 data set. From the figures, we can conclude that with the increase of $\alpha$ , the attack performance $P$ and perturbations $Z$ are reduced. This is mainly because with the increase of $\alpha$ , the influence of perturbation on the image gradually increases, the adversarial example is more similar to the original image, and the attack performance $P$ gradually decreases.

According to different actual needs, we will choose a different $\alpha$ . In this paper, we makes $\alpha=3$ , because the perturbation almost does not decrease with the increase of $\alpha$ , and $P$ is still as high as 0.9. It should be noted that although $\alpha$ affects the value of $P$ , since we consider the perturbation after the attack is successful, in general, $\alpha$ does not affect the attack success rate.

5.3 Attack performance comparison

To verify the effectiveness of our approach, we compared POBA-GA with the classic white-box and black-box attack methods. We found that POBA-GA has a high attack success rate, even surpassing some white-box attacks, and it can initially achieve successful attacks with a small number of queries. Table 5 shows the attack success rate, perturbations and attack time cost of the different adversarial attacks. The number of queries in parentheses represents the number of queries when the initial attack succeeds, which is used to assist in comparing the performance of our algorithm with ZOO and AutoZOOM algorithms.

5.3.1 Perturbation

The adversarial examples generate by POBA-GA has less perturbation with the same attack success rate. Especially for MNIST and CIFAR-10 datasets, examples generated by our method have less perturbation than most white-box and black-box attacks, mainly because the image is small and the genetic algorithm is easier to find the approximate global optimal solution. For the ImageNet64 data set, since the white-box attack method grasps the internal structure of the target model, we only compare with the black-box attack. From Table 5 we find that the perturbation of POBA-GA is significantly smaller than ZOO and AutoZOOM. Although POBA-GA’s perturbation is larger than Boundary, POBA-GA has fewer queries. In addition, when $\alpha=10$ and the number of queries is 54000, the perturbation generated by POBA-GA can reach 8.1e-07 without affecting the attack success rate, but the attack perturbation $P$ will decrease. Therefore POBA-GA performs well in perturbation performance. It makes good sense that the perturbations of POBA-GA are much smaller than baselines since the initial perturbations are very small, and the optimization process of GA is capable of gradually reducing the perturbation during iterations.

5.3.2 Attack success rate

POBA-GA has a better attack success rate and even better than some white box attacks. Usually, white-box attack conducted based on the internal structure and parameters of the model, can perform a higher success rate attack. The attack success rate of DeepFool is low, mainly because it has strict requirements on perturbation. From Table 5 you can find that ZOO also has a higher attack success rate, however it mainly relies on a large number of queries and iterations.

Specifically, on the ImageNet64 data set, ZOO requires about 220 thousand queries for 90% attack success rate, AutoZOOM requires about 1600 queries for 100% attack success rate, while POBA-GA only need about 500 queries for 96% attack success rat. In addition, the perturbation of POBA-GA is only one-fifth of the ZOO and AutoZOOM, or even smaller. On the whole, POBA-GA achieves state-of-the-art black-box attack performances in consideration of both attack success rate and perturbation. Although there is small margin in success rate when compared with AutoZOOM, POBA-GA needs much less query times and less perturbations. We can conclude that POBA-GA is more practical in real world application when the require time is strict while we can tolerate with relatively high (less than 100%) attack success rate.

The reason why we have such a high attack success rate is mainly because we do not consider the effects of the perturbation on the experiment before the attack is successful. But it should be noted that even if we do not consider the effects of the perturbation, the perturbation will not be very large, as we have limited the perturbation at initialization.

5.3.3 Query count

POBA-GA has fewer queries than other black-box attacks. Since white-box attacks are conducted based on the internal structure of the model, we only analyze black-box attacks. From Table 5, we can find that POBA-GA and AutoZOOM need significantly fewer queries than Boundary and ZOO. AutoZOOM reduces the query time by adopting an adaptive random full gradient estimation strategy to strike a balance between query counts and estimation errors, and features a decoder (AE or BiLIN) for attack dimension reduction and algorithm acceleration. POBA-GA reduces the number of queries by genetic algorithm. The main reasons are as follows: 1) When selecting examples, it is more likely to choose examples with high adaptability. 2)The high probability of crossover and variation increases the example diversity. 3) Use father and son mixed selection to retain the best example. 4)The effects of the perturbations are not considered before the attack is successful.

5.4 Universality

Targeted attack: Non-targeted attack only needs to make the $TM$ output label different from the correct one, and targeted attack need the target model output specified label. In order to verify the universality of the POBA-GA algorithm, we also achieve a target attack on $TM$ . The fitness function value is designed as $\phi(AS^{t}_{i})=p(y_{tar}|AS^{t}_{i})-p(y_{0}|AS^{t}_{i})-\alpha Z(AS^{t}_{i})$ , where $p(y_{tar}|AS^{t}_{i}$ ) is the probability that the target model will classify the input picture as label $y_{tar}$ . Figure 7 shows the targeted attack. Each line represents the original label, and each column represents the output label of the target model. The adversarial example average fitness function $\phi(AS_{pot})=0.53$ , perturbation $Z(A_{pot})=3.846$ . Therefore, the POBA-GA method can implement targeted attacks. We randomly selected 50 examples from the ImageNet64 data set to attack VGG19, inc-V3 and Resnet50. The attack success rates were $82\%$ , $80\%$ and $84\%$ , respectively.

A single binary outcome: In general, the DNN model will give the confidence of the classification, but there are also some depth models that only return a single binary. Therefore, we experiment with models that just a single binary outcome. Different with RCC(Return confidence of classification), we cannot calculate the attack performance $P(AS^{t}_{i})$ of RSB (Return a single binary). Therefor, we use Monte Carlo approximation to estimate the confidence of the classification, and then optimize the perturbation with POBA-GA. Estimating the confidence of $AS^{t}_{i}$ with Monte Carlo approximation can be expressed by the Eq. 11.

[TABLE]

where $\delta\sim\aleph(0,30)$ , $N^{\prime}=100$ is the number of examples used to estimating the confidence of $AS^{t}_{i}$ . $R(AS^{t}_{i}+\delta)$ is the binary outcome of $AS^{t}_{i}+\delta$ , for example, when $AS^{t}_{i}+\delta$ is predicted to the be the second class, then $R(AS^{t}_{i}+\delta)=[0,1,0,...,0]$ .

Table 6 shows the comparison using the RCC(Return confidence of classification) and RSB (Return a single binary) on the VGG19. From the table 6 we can find that if the model just return a single binary, the number of queries will increase by about 100 times, mainly because we need to estimate the confidence of the example through 100 queries. The attack success rate will be reduced from 98% to 73%. This is mainly because, although we have estimated the confidence by Monte Carlo approximation, there is a certain difference from the actual situation. If you want to increase the attack success rate, you can increase the $N^{\prime}$ or the number of iterations.

Real world experimentation: In this section, we perform real world experimentation on GCP(which has a black box model), to demonstrate the applicability of the proposed methodologies. Considering the limited number of model queries in the actual scene, this paper only queries 1000 times, i.e. iterative 50 generations, generating 20 examples per generation. The experimental results are shown in Figure 8. Figure 8(a) is the prediction result of the original example, Figure 8(b) is the prediction result of a certain initial adversarial example, and Figure 8(c) is the prediction result of a $50^{th}$ generation adversarial example. The experimental results show that if we do not consider the perturbation, we can quickly achieve the attack on GCP. During the optimization process, we optimized the perturbation while ensuring that the attack was successful, and increased the confidence of the eggs from $64\%$ to $83\%$ . Considering the limited number of model queries in the actual scene, this paper only queries 1000 times. If the query continues, the confidence of the real class label will further decrease. Therefore, POBA-GA can achieve the attack on the real world experimentation of GCP and proves the applicability of the proposed method.

5.5 Defensiveness from POBA-GA adversarial training

In this section, we compare the defense performance of POBA-GA adversarial training and ensemble adversarial training, and prove that the defense capability of the ensemble adversarial training is weak. Defender need add the adversarial examples generate by POBA-GA into the training data set of defense model to improve the robustness of the model. The adversarial training in this paper is achieved by retraining the model after adding the adversarial examples to the training data set. The retraining data set has 20 different classes, each class consisting of 900 normal examples and 100 adversarial examples. In the ensemble adversarial training, we use FGSM, SJMA, Deep-Fool, BIM and C&W generated 5*20 adversarial examples for each class. In the POBA-GA adversarial training, we added 100 POBA-GA adversarial examples to the training data set. Both POBA-GA adversarial training and ensemble adversarial training use a batch size of 100 and $STEPS$ (iterations number) of 10,000. The $LR$ (Learning Rate) of this paper is calculated by Eq. 12.

[TABLE]

where $BLR$ is the Base Learning Rate, $MLR$ is the Minimum Learning Rate and $i$ is the current iterations number. In this paper we make $BLR=0.1$ , $MLR=0.001$ . The adversarial training process is shown in Figure 9.

Ensemble adversarial training: Generate adversarial examples to classifier the adversarial examples generated by ensemble attacks and normal examples. In our case, FGSM, SJMA, DeepFool, BIM, C&W are adopted as ensemble model to attack $TM$ .

POBA-GA adversarial training: Adversarial training is applied to $TM$ based on adversarial examples generated by POBA-GA. POBA-GA can provide high-quality adversarial examples for training which could improve defensibility of the target model.

Table 7 shows the target model’s defensibility comparison after ensemble adversarial training and POBA-GA adversarial training. Attack success rate of almost all attacks are decreased significantly compared with before adversarial training (in Table 5). From the experiment result, we can also find that the attack success rate of MNIST and CIFAR-10 are lower than ImageNet64. This is mainly because they have fewer pixels and similar adversarial examples, while ImageNet64 has more pixels and can choose more ways to change. We can find that the adversarial training based on POBA-GA attacks is less effective than ensemble adversarial training on white-box attack. This is mainly because the ensemble adversarial training is implemented based on the adversarial examples generated by the white-box attack. However, the adversarial defense against black-box attacks is better than the ensemble defense. The most important thing is that adversarial defense reduce the attack success rate of POBA-GA to about 30%.

5.6 Facial recognition application

Our experiments were mainly carried out on data sets such as VGG19, inc-V3 and Resnet50. However, such detectors are not widely used in practical applications. Therefore, we expand the application scenario of POBA-GA. In recent years, facial recognition-based identity authentication systems have become popular and widely used in life [65, 66], so the security of face recognition systems has become increasingly important.

The experiment selected the Wild dataset (LFW) as the experimental data set.The Labeled Faces in the Wild dataset (LFW) [67] containing more than 5,000 faces and more than 10,000 images. We use POBA-GA to attack LFW faces and generate corresponding adversarial examples. Figure 10 shows the original image and their true labels, and the adversarial examples generated by POBA-GA and the target model predicted label. From the figure we can see that POBA-GA can indeed achieve face attacks through perturbation optimization, which has strong applicability.

6 Conclusion

Adversarial attack against deep model can cause fundamental errors. We focus on black-box attacks since it is more operable and harder to defend. Different from current black-box adversarial attacks, we propose a novel evolutionary algorithm based adversarial example generation method for black-box attack implementation. A perturbation optimized black-box attacks (POBA-GA) is put forward against deep neural networks. Abundant experiments are carried out compared with classic white-box and black-box adversarial attack methods. The results prove that POBA-GA has higher attack success rate than other attack methods. It can achieve 100% attack success rate in CIFAR-10 and MNIST classification models, and it can achieve 96% attack success rate on ImageNet64 black-box method. In both attack success rate and perturbation control, POBA-GA has better performance than existing black-box attack. For further study, we will study on POBA-GA’s attack transferability on different target models.

Acknowledgment

This work is partially supported by National Natural Science Foundation of China (61502423, 61572439), Zhejiang Natural Science Foundation(LY19F020025), Signal Recognition Based on GAN, Deep Learning for Enhancement Recognition Project, Zhejiang University Open Fund(2018KFJJ07), and Zhejiang Science and Technology Plan Project (2017C33149).

References

[1]

I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, Vol. 1, MIT press Cambridge, 2016.

[2]

S. Deng, L. Huang, G. Xu, X. Wu, Z. Wu, On deep learning for trust-aware recommendations in social networks, IEEE transactions on neural networks and learning systems 28 (5) (2017) 1164–1177.

[3]

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, C. I. Sánchez, A survey on deep learning in medical image analysis, Medical image analysis 42 (2017) 60–88.

[4]

T. Kooi, G. Litjens, B. van Ginneken, A. Gubern-Mérida, C. I. Sánchez, R. Mann, A. den Heeten, N. Karssemeijer, Large scale deep learning for computer aided detection of mammographic lesions, Medical image analysis 35 (2017) 303–312.

[5]

M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, H. Larochelle, Brain tumor segmentation with deep neural networks, Medical image analysis 35 (2017) 18–31.

[6]

Y. Geifman, R. El-Yaniv, Selective classification for deep neural networks, in: Advances in neural information processing systems, 2017, pp. 4885–4894.

[7]

S. De Cnudde, D. Martens, F. Provost, et al., An exploratory study towards applying and demystifying deep learning classification on behavioral big data, Tech. rep. (2018).

[8]

J. Stilgoe, Machine learning, social learning and the governance of self-driving cars, Social studies of science 48 (1) (2018) 25–56.

[9]

C. Szegedy, V. O. Vanhoucke, Processing images using deep neural networks, uS Patent 9,715,642 (Jul. 25 2017).

[10]

J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How transferable are features in deep neural networks?, in: Advances in neural information processing systems, 2014, pp. 3320–3328.

[11]

Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in: Robotics and Automation (ICRA), 2017 IEEE International Conference on, IEEE, 2017, pp. 3357–3364.

[12]

Y. Yuan, L. Mou, X. Lu, Scene recognition by manifold regularized deep learning architecture, IEEE transactions on neural networks and learning systems 26 (10) (2015) 2222–2233.

[13]

Q. Liu, T. Liu, Z. Liu, Y. Wang, Y. Jin, W. Wen, Security analysis and enhancement of model compressed deep learning systems under adversarial attacks, in: Proceedings of the 23rd Asia and South Pacific Design Automation Conference, IEEE Press, 2018, pp. 721–726.

[14]

A. Ramanathan, L. Pullum, Z. Husein, S. Raj, N. Torosdagli, S. Pattanaik, S. K. Jha, Adversarial attacks on computer vision algorithms using natural perturbations, in: Contemporary Computing (IC3), 2017 Tenth International Conference on, IEEE, 2017, pp. 1–6.

[15]

W. Bai, C. Quan, Z. Luo, Alleviating adversarial attacks via convolutional autoencoder, in: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2017 18th IEEE/ACIS International Conference on, IEEE, 2017, pp. 53–58.

[16]

Z. Yin, F. Wang, W. Liu, S. Chawla, Sparse feature attacks in adversarial learning, IEEE Transactions on Knowledge and Data Engineering 30 (6) (2018) 1164–1177.

[17]

J. H. Metzen, M. C. Kumar, T. Brox, V. Fischer, Universal adversarial perturbations against semantic image segmentation.

[18]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199.

[19]

M. Sharif, S. Bhagavatula, L. Bauer, M. K. Reiter, Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, 2016, pp. 1528–1540.

[20]

K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, D. Song, Robust physical-world attacks on deep learning models, arXiv preprint arXiv:1707.08945.

[21]

W. Xu, Y. Qi, D. Evans, Automatically evading classifiers, in: Proceedings of the 2016 Network and Distributed Systems Symposium, 2016.

[22]

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, The limitations of deep learning in adversarial settings, in: Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, IEEE, 2016, pp. 372–387.

[23]

S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, Deepfool: a simple and accurate method to fool deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2574–2582.

[24]

J. Su, D. V. Vargas, S. Kouichi, One pixel attack for fooling deep neural networks, arXiv preprint arXiv:1710.08864.

[25]

A. Ilyas, L. Engstrom, A. Athalye, J. Lin, Black-box adversarial attacks with limited queries and information, arXiv preprint arXiv:1804.08598.

[26]

W. Brendel, J. Rauber, M. Bethge, Decision-based adversarial attacks: Reliable attacks against black-box machine learning models, arXiv preprint arXiv:1712.04248.

[27]

A. Kurakin, I. Goodfellow, S. Bengio, Adversarial examples in the physical world, arXiv preprint arXiv:1607.02533.

[28]

M. Cisse, Y. Adi, N. Neverova, J. Keshet, Houdini: Fooling deep structured prediction models, arXiv preprint arXiv:1707.05373.

[29]

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, Practical black-box attacks against machine learning, in: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ACM, 2017, pp. 506–519.

[30]

S. M. Moosavidezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial perturbations, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 86–94.

[31]

M. Talha, M. S. Saeed, G. Mohiuddin, M. Ahmad, M. J. Nazar, N. Javaid, Energy optimization in home energy management system using artificial fish swarm algorithm and genetic algorithm, in: International Conference on Intelligent Networking and Collaborative Systems, Springer, 2017, pp. 203–213.

[32]

R. Syahputra, Distribution network optimization based on genetic algorithm, Journal of Electrical Technology UMY 1 (1) (2017) 1–9.

[33]

J. M. Gil, J. F. A. Montes, E. Alba, J. Aldana-Montes, Optimizing ontology alignments by using genetic algorithms.

[34]

N. Goyal, R. Bhatia, M. Kumar, A genetic algorithm based focused web crawler for automatic webpage classification.

[35]

N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in computer vision: A survey, arXiv preprint arXiv:1801.00553.

[36]

I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572.

[37]

N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, in: Security and Privacy (SP), 2017 IEEE Symposium on, IEEE, 2017, pp. 39–57.

[38]

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial perturbations, arXiv preprint.

[39]

J. Hayes, G. Danezis, Machine learning as an adversarial service: Learning black-box adversarial examples, arXiv preprint arXiv:1708.05207.

[40]

N. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, W. Zhou, Hidden voice commands., in: USENIX Security Symposium, 2016, pp. 513–530.

[41]

W. Hu, Y. Tan, Black-box attacks against rnn based malware detection algorithms, arXiv preprint arXiv:1705.08131.

[42]

M. Sharif, S. Bhagavatula, L. Bauer, M. K. Reiter, Adversarial generative nets: Neural network attacks on state-of-the-art face recognition, arXiv preprint arXiv:1801.00349.

[43]

M. Sun, F. Tang, J. Yi, F. Wang, J. Zhou, Identify susceptible locations in medical records via adversarial attacks on deep predictive models, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2018, pp. 793–801.

[44]

T. S. Sethi, M. Kantardzic, Data driven exploratory attacks on black box classifiers in adversarial domains, Neurocomputing 289 (2018) 129–143.

[45]

A. J. Bose, P. Aarabi, Adversarial attacks on face detectors using neural net based constrained optimization, arXiv preprint arXiv:1805.12302.

[46]

F. Yu, Q. Dong, X. Chen, Asp: A fast adversarial attack example generation framework based on adversarial saliency prediction, arXiv preprint arXiv:1802.05763.

[47]

S.-T. Chen, C. Cornelius, J. Martin, D. H. Chau, Robust physical adversarial attack on faster r-cnn object detector, arXiv preprint arXiv:1804.05810.

[48]

F. Chen, J. Ma, An empirical identification method of gaussian blur parameter for image deblurring, IEEE Transactions on signal processing 57 (7) (2009) 2467–2478.

[49]

R. Varatharajan, K. Vasanth, M. Gunasekaran, M. Priyan, X. Gao, An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images, Computers & Electrical Engineering.

[50]

P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, C.-J. Hsieh, Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models, in: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, ACM, 2017, pp. 15–26.

[51]

L. Smith, Y. Gal, Understanding measures of uncertainty for adversarial example detection, arXiv preprint arXiv:1803.08533.

[52]

Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, J. Li, Boosting adversarial attacks with momentum, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9185–9193.

[53]

M. A. A. Milton, Evaluation of momentum diverse input iterative fast gradient sign method (m-di2-fgsm) based attack method on mcs 2018 adversarial attacks on black box face recognition system, arXiv preprint arXiv:1806.08970.

[54]

S. Sarkar, A. Bansal, U. Mahbub, R. Chellappa, Upset and angri: Breaking high performance image classifiers, arXiv preprint arXiv:1707.01159.

[55]

E. Kalajac, A. Karabegović, M. Ponjavić, Optimization of the structures locations using a genetic algorithm in the transmission line design.

[56]

M. Alencar, J. Souza, B. Souza, W. Neves, Optimal allocation of photovoltaic panels in distribution network applying genetic algorithm, in: 2018 Simposio Brasileiro de Sistemas Eletricos (SBSE), IEEE, 2018.

[57]

L. A. Souza, C. L. Silva, W. P. Calixto, Optimized setting of directional overcurrent relays via genetic algorithm, in: 2018 Simposio Brasileiro de Sistemas Eletricos (SBSE), IEEE, 2018.

[58]

C. C. Tu, P. Ting, P. Y. Chen, S. Liu, H. Zhang, J. Yi, C. J. Hsieh, S. M. Cheng, Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks.

[59]

A. Konak, D. W. Coit, A. E. Smith, Multi-objective optimization using genetic algorithms: A tutorial, Reliability Engineering & System Safety 91 (9) (2006) 992–1007.

[60]

A. Lipowski, D. Lipowska, Roulette-wheel selection via stochastic acceptance, Physica A: Statistical Mechanics and its Applications 391 (6) (2012) 2193–2196.

[61]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.

[62]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[63]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.

[64]

Y. Dong, F. Liao, T. Pang, H. Su, X. Hu, J. Li, J. Zhu, Boosting adversarial attacks with momentum.

[65]

W. Deng, J. Hu, N. Zhang, B. Chen, J. Guo, Fine-grained face verification: Fglfw database, baselines, and human-dcmn partnership, Pattern Recognition 66 (2017) 63–73.

[66]

H. Zhou, K.-M. Lam, Age-invariant face recognition based on identity inference from appearance age, Pattern Recognition 76 (2018) 191–202.

[67]

G. B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: A database forstudying face recognition in unconstrained environments, in: Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.

Appendix

Parameter adjustment

Figure 11 shows the effect of $P_{c}$ on experimental results. Figure 11 (a) shows the best of the fitness function of the parent adversarial the example. Figures 11 (b) and (c) show the perturbation and attack performance $P$ of examples with the best fitness function value.

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, Vol. 1, MIT press Cambridge, 2016.
2[2] S. Deng, L. Huang, G. Xu, X. Wu, Z. Wu, On deep learning for trust-aware recommendations in social networks, IEEE transactions on neural networks and learning systems 28 (5) (2017) 1164–1177.
3[3] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, C. I. Sánchez, A survey on deep learning in medical image analysis, Medical image analysis 42 (2017) 60–88.
4[4] T. Kooi, G. Litjens, B. van Ginneken, A. Gubern-Mérida, C. I. Sánchez, R. Mann, A. den Heeten, N. Karssemeijer, Large scale deep learning for computer aided detection of mammographic lesions, Medical image analysis 35 (2017) 303–312.
5[5] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, H. Larochelle, Brain tumor segmentation with deep neural networks, Medical image analysis 35 (2017) 18–31.
6[6] Y. Geifman, R. El-Yaniv, Selective classification for deep neural networks, in: Advances in neural information processing systems, 2017, pp. 4885–4894.
7[7] S. De Cnudde, D. Martens, F. Provost, et al., An exploratory study towards applying and demystifying deep learning classification on behavioral big data, Tech. rep. (2018).
8[8] J. Stilgoe, Machine learning, social learning and the governance of self-driving cars, Social studies of science 48 (1) (2018) 25–56.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

POBA-GA: Perturbation Optimized Black-Box Adversarial Attacks via Genetic Algorithm

Abstract

keywords:

MSC:

1 Introduction

2 Related Works

2.1 Attacks methods

2.1.1 White-box Adversarial Attack Models

2.1.2 Black-box Adversarial Attack Models

2.2 Genetic algorithm

2.3 Perturbation evaluation metrics

3 Model

3.1 Problem definition

3.1.1 DEFINITION 1 (DNN based image label)

3.1.2 DEFINITION 2 (Adversarial attack)

3.2 Framework

3.3 Initialization

3.4 Fitness function

3.5 Evolutionary operations

3.5.1 Selection operator

3.5.2 Crossover operator

3.5.3 Mutation operator

3.6 Generation update

4 Evaluation Metrics

4.1 Naked-eye evaluation

4.2 Machine evaluation

5 Experiments

5.1 Experiment setup

5.2 Influence of perturbation ratio parameter α\alphaα

5.3 Attack performance comparison

5.3.1 Perturbation

5.3.2 Attack success rate

5.3.3 Query count

5.4 Universality

5.5 Defensiveness from POBA-GA adversarial training

5.6 Facial recognition application

6 Conclusion

Acknowledgment

References

Appendix

Parameter adjustment

5.2 Influence of perturbation ratio parameter $\alpha$