A Kernelized Manifold Mapping to Diminish the Effect of Adversarial   Perturbations

Saeid Asgari Taghanaki; Kumar Abhishek; Shekoofeh Azizi; Ghassan; Hamarneh

arXiv:1903.01015·cs.CV·May 10, 2019

A Kernelized Manifold Mapping to Diminish the Effect of Adversarial Perturbations

Saeid Asgari Taghanaki, Kumar Abhishek, Shekoofeh Azizi, Ghassan, Hamarneh

PDF

1 Repo

TL;DR

This paper introduces a non-linear manifold mapping technique using kernelized features to enhance the robustness of deep convolutional neural networks against adversarial attacks without sacrificing accuracy on clean data.

Contribution

It proposes a novel radial basis convolutional feature mapping that maps features onto a well-separated manifold, reducing adversarial vulnerability.

Findings

01

Improves robustness to gradient and non-gradient attacks

02

Maintains accuracy on clean datasets

03

Outperforms several masking defense strategies

Abstract

The linear and non-flexible nature of deep convolutional models makes them vulnerable to carefully crafted adversarial perturbations. To tackle this problem, we propose a non-linear radial basis convolutional feature mapping by learning a Mahalanobis-like distance function. Our method then maps the convolutional features onto a linearly well-separated manifold, which prevents small adversarial perturbations from forcing a sample to cross the decision boundary. We test the proposed method on three publicly available image classification and segmentation datasets namely, MNIST, ISBI ISIC 2017 skin lesion segmentation, and NIH Chest X-Ray-14. We evaluate the robustness of our method to different gradient (targeted and untargeted) and non-gradient based attacks and compare it to several non-gradient masking defense strategies. Our results demonstrate that the proposed method can increase…

Tables6

Table 1. Table 1 : Classification accuracy under different attacks tested on MNIST dataset. FGSM: ϵ = 0.3 italic-ϵ 0.3 \epsilon=0.3 ; BIM: ϵ = 0.3 italic-ϵ 0.3 \epsilon=0.3 and iterations = 5; MIM: ϵ = 0.3 italic-ϵ 0.3 \epsilon=0.3 , iterations = 10, and decay factor = 1; PGD: ϵ = 0.1 italic-ϵ 0.1 \epsilon=0.1 , iterations = 40; C&W: iterations = 50, GN: ϵ = 20 italic-ϵ 20 \epsilon=20 ; SPSA: ϵ = 0.3 italic-ϵ 0.3 \epsilon=0.3 . “n/a” denotes that the corresponding entry was not reported in the respective paper.

Models	Clean	$L_{2}$		$L_{\infty}$
Models	Clean	C&W [5]	GN [33]	FGSM [19]	BIM [19]	MIM [9]	PGD [27]	SPSA [49]
ORIG [30]	0.9930	0.1808	0.7227	0.0968	0.0070	0.0051	0.1365	0.3200
Binary CNN [23]	0.9850	n/a	0.9200	0.7100	0.7000	0.7000	n/a	n/a
NN [23]	0.9690	n/a	0.9100	0.6800	0.4300	0.2600	n/a	n/a
Binary ABS [23]	0.9900	n/a	0.8900	0.8500	0.8600	0.8500	n/a	n/a
ABS [23]	0.9900	n/a	0.9800	0.3400	0.1300	0.1700	n/a	n/a
Fortified Net [20]	0.9893	0.6058	n/a	0.9131	n/a	n/a	0.7954	n/a
PROP	0.9942	0.9879	0.7506	0.8582	0.7887	0.6425	0.8157	0.7092

Table 2. Table 2 : Feature mapping analysis via intra-class compactness and inter-class separability measures of the MNIST dataset for original 3-layer CNN versus the proposed method. The abbreviated column headers are Silhouette, Calinski, Mutual Information, Homogeneity, and Completeness metrics, respectively.

	Sil.	Cal.	MI	Homo.	Comp.
ORIG [30]	0.2612	1658.20	0.9695	0.9696	0.9721
PROP	0.4284	2570.42	0.9720	0.9721	0.9815

Table 3. Table 3 : Classification accuracy on CHEST for different attacks and defenses.

	Defense
Attack	Iteration	ORIG	GDA	FSM	PROP
$L_{1}$ BIM [19]	5	0	0	0.55	0.63
$L_{\infty}$ BIM [19]	5	0	0	0.54	0.65
Clean	-	0.74	0.75	0.57	0.74

Table 4. Table 4 : Segmentation results (average DICE ± plus-or-minus \pm standard error) of different defense mechanisms compared to the proposed radial basis feature mapping method for V-Net and U-Net under DAG attack. 10 i 10 𝑖 10i and 30 i 30 𝑖 30i refer to 10 and 30 iterations of attack, respectively.

Network	Method	Clean	$10 i$ (% Accuracy drop)	$30 i$ (% Accuracy drop)
U-Net [34]	ORIG [34]	$0.7743 \pm 0.0202$	$0.5594 \pm 0.0196 (27.75 %)$	$0.4396 \pm 0.0222 (43.23 %)$
	FSG [55]	$0.7292 \pm 0.0229$	$0.6382 \pm 0.0206 (15.58 %)$	$0.5858 \pm 0.0218 (24.34 %)$
	FSM [55]	$0.7695 \pm 0.0198$	$0.6039 \pm 0.0199 (22.01 %)$	$0.5396 \pm 0.0211 (30.31 %)$
	ADVT [14]	$0.6703 \pm 0.0273$	$0.7012 \pm 0.0255 (9.44 %)$	$0.6700 \pm 0.0260 (13.47 %)$
	PROP	0.7780 $\pm$ 0.0209	0.7619 $\pm$ 0.0208 (1.60%)	0.7248 $\pm$ 0.0226 (6.39%)
V-Net [29]	ORIG [34]	$0.8070 \pm 0.0189$	$0.5320 \pm 0.0207 (34.10 %)$	$0.3865 \pm 0.0217 (52.10 %)$
	FSG [55]	$0.7886 \pm 0.0205$	$0.6990 \pm 0.0189 (13.38 %)$	$0.6840 \pm 0.0188 (15.24 %)$
	FSM [55]	$0.8084 \pm 0.0189$	$0.5928 \pm 0.0209 (26.54 %)$	$0.5144 \pm 0.0218 (36.26 %)$
	ADVT [14]	$0.7924 \pm 0.0162$	$0.7121 \pm 0.0174 (11.76 %)$	0.7113 $\pm$ 0.0179 (11.85%)
	PROP	0.8213 $\pm$ 0.0177	0.7384 $\pm$ 0.0169 (8.50%)	$0.6944 \pm 0.0178 (13.95 %)$

Table 5. Table 5 : Segmentation DICE ± plus-or-minus \pm standard error scores of black-box attacks; adversarial images were produced with methods in first left column and tested with methods in the first row. U-PROP and V-PROP refer to equipped U-Net and V-Net with our mapping method.

-	U-Net [34]	U-PROP	V-Net [29]	V-PROP
U-Net [34]	-	0.7341 $\pm$ 0.0205	$0.6364 \pm 0.0189$	$0.7210 \pm 0.0189$
U-PROP	0.7284 $\pm$ 0.0219	-	$0.6590 \pm 0.0218$	$0.7262 \pm 0.0241$
V-Net [29]	$0.7649 \pm 0.0168$	0.7773 $\pm$ 0.0167	-	$0.7478 \pm 0.2090$
V-PROP	$0.7922 \pm 0.0188$	0.7964 $\pm$ 0.0192	$0.6948 \pm 0.0171$	-

Table 6. Table 6 : Ablation study over the usefulness of learning the transformation matrix Ψ Ψ \Psi and β 𝛽 \beta on SKIN for V-Net; mean ± plus-or-minus \pm standard error

	$Ψ$	$β$	Dice	FPR	FNR
Clean	✗	✓	$0.7721 \pm 0.0210$	$0.0149 \pm 0.0022$	$0.2041 \pm 0.0245$
	✓	✗	$0.8200 \pm 0.0163$	$0.0177 \pm 0.0026$	$0.1547 \pm 0.0188$
	✗	✗	$0.8002 \pm 0.0184$	$0.0137 \pm 0.0022$	$0.1883 \pm 0.0211$
	✓	✓	0.8213 $\pm$ 0.0177	$0.0141 \pm 0.0020$	$0.1706 \pm 0.0200$
$10 i$	✗	✓	$0.6471 \pm 0.0213$	$0.0437 \pm 0.0052$	$0.1992 \pm 0.0261$
	✓	✗	$0.7010 \pm 0.0161$	$0.0606 \pm 0.0054$	$0.1020 \pm 0.0166$
	✗	✗	$0.6740 \pm 0.0187$	$0.0458 \pm 0.0037$	$0.1472 \pm 0.0217$
	✓	✓	0.7384 $\pm$ 0.0169	$0.0444 \pm 0.0041$	$0.1234 \pm 0.0186$
$30 i$	✗	✓	$0.6010 \pm 0.0221$	$0.0371 \pm 0.0030$	$0.2304 \pm 0.0273$
	✓	✗	$0.6458 \pm 0.0180$	$0.0633 \pm 0.0042$	$0.1164 \pm 0.0183$
	✗	✗	$0.6188 \pm 0.0188$	$0.0615 \pm 0.0040$	$0.1384 \pm 0.0205$
	✓	✓	0.6944 $\pm$ 0.0179	$0.0418 \pm 0.0038$	$0.1489 \pm 0.0205$

Equations22

κ (x, z) = ϕ (x) . ϕ (z) \forall x, z \in F

κ (x, z) = ϕ (x) . ϕ (z) \forall x, z \in F

ϕ_{k}^{(l)} = e^{- β_{k}^{(l)} D (f_{K^{(l)}}^{(l)}, c_{k}^{(l)})}

ϕ_{k}^{(l)} = e^{- β_{k}^{(l)} D (f_{K^{(l)}}^{(l)}, c_{k}^{(l)})}

D (f_{K^{(l)}}^{(l)}, c_{k}^{(l)}) = (f_{K^{(l)}}^{(l)} - c_{k}^{(l)})^{T} (Ψ^{(l)})^{- 1} (f_{K^{(l)}}^{(l)} - c_{k}^{(l)})

D (f_{K^{(l)}}^{(l)}, c_{k}^{(l)}) = (f_{K^{(l)}}^{(l)} - c_{k}^{(l)})^{T} (Ψ^{(l)})^{- 1} (f_{K^{(l)}}^{(l)} - c_{k}^{(l)})

g^{(l)} (f_{K^{(l)}}^{(l)}) = k = 1 \sum P^{(l)} (w_{k}^{(l)} \cdot ϕ_{k}^{(l)}) + b^{(l)}

g^{(l)} (f_{K^{(l)}}^{(l)}) = k = 1 \sum P^{(l)} (w_{k}^{(l)} \cdot ϕ_{k}^{(l)}) + b^{(l)}

g^{(l)} (f^{(l)}) = k = 1 \sum P^{(l)} (w_{k}^{(l)} \cdot ϕ_{k}^{(l)}) + b^{(l)} = k = 1 \sum P^{(l)} (w_{k}^{(l)} \cdot ϕ_{k}^{(l)} (f_{k}^{(l)} (θ))) + b^{(l)} = k = 1 \sum P^{(l)} (w_{k}^{(l)}) \cdot exp {- β_{k}^{(l)} (f_{k}^{(l)} (θ) - c_{k}^{(l)})^{T} (Ψ^{(l)})^{- 1} (f_{k}^{(l)} (θ) - c_{k}^{(l)})} + b^{(l)}

g^{(l)} (f^{(l)}) = k = 1 \sum P^{(l)} (w_{k}^{(l)} \cdot ϕ_{k}^{(l)}) + b^{(l)} = k = 1 \sum P^{(l)} (w_{k}^{(l)} \cdot ϕ_{k}^{(l)} (f_{k}^{(l)} (θ))) + b^{(l)} = k = 1 \sum P^{(l)} (w_{k}^{(l)}) \cdot exp {- β_{k}^{(l)} (f_{k}^{(l)} (θ) - c_{k}^{(l)})^{T} (Ψ^{(l)})^{- 1} (f_{k}^{(l)} (θ) - c_{k}^{(l)})} + b^{(l)}

g^{(L)} (f^{(L)}) = k = 1 \sum P^{(L)} (w_{k}^{(L)}) \cdot exp {- β_{k}^{(L)} (f_{k}^{(L)} (θ) - c_{k}^{(L)})^{T} (Ψ^{(L)})^{- 1} (f_{k}^{(L)} (θ) - c_{k}^{(L)})} + b^{(L)}

g^{(L)} (f^{(L)}) = k = 1 \sum P^{(L)} (w_{k}^{(L)}) \cdot exp {- β_{k}^{(L)} (f_{k}^{(L)} (θ) - c_{k}^{(L)})^{T} (Ψ^{(L)})^{- 1} (f_{k}^{(L)} (θ) - c_{k}^{(L)})} + b^{(L)}

y = [g^{(L)} (f^{(L)}), f^{(L)}]

y = [g^{(L)} (f^{(L)}), f^{(L)}]

ξ (y)_{i} = \frac{e ^{y_{i}}}{\sum _{j}^{Q} e ^{y_{j}}}

ξ (y)_{i} = \frac{e ^{y_{i}}}{\sum _{j}^{Q} e ^{y_{j}}}

L = - i \sum Q t_{i} l o g (ξ (y)_{i})

L = - i \sum Q t_{i} l o g (ξ (y)_{i})

L = - l o g (\frac{e ^{y_{p}}}{\sum _{j}^{Q} e ^{y_{j}}})

L = - l o g (\frac{e ^{y_{p}}}{\sum _{j}^{Q} e ^{y_{j}}})

A^{*}, C^{*}, β^{*}, Θ^{*} = ar g A, C, β, Θ min L (A, C, β, Θ) .

A^{*}, C^{*}, β^{*}, Θ^{*} = ar g A, C, β, Θ min L (A, C, β, Θ) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asgsaeid/KernelizedManifoldMapping
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Kernelized Manifold Mapping to Diminish the Effect of

Adversarial Perturbations

Saeid Asgari Taghanaki Corresponding author School of Computing Science, Simon Fraser University, Canada

{sasgarit, kabhishe, hamarneh}@sfu.ca

Kumar Abhishek

School of Computing Science, Simon Fraser University, Canada

{sasgarit, kabhishe, hamarneh}@sfu.ca

Shekoofeh Azizi

Department of Electrical and Computer Engineering, University of British Columbia, Canada

[email protected]

Ghassan Hamarneh

School of Computing Science, Simon Fraser University, Canada

{sasgarit, kabhishe, hamarneh}@sfu.ca

Abstract

The linear and non-flexible nature of deep convolutional models makes them vulnerable to carefully crafted adversarial perturbations. To tackle this problem, we propose a non-linear radial basis convolutional feature mapping by learning a Mahalanobis-like distance function. Our method then maps the convolutional features onto a linearly well-separated manifold, which prevents small adversarial perturbations from forcing a sample to cross the decision boundary. We test the proposed method on three publicly available image classification and segmentation datasets namely, MNIST, ISBI ISIC 2017 skin lesion segmentation, and NIH Chest X-Ray-14. We evaluate the robustness of our method to different gradient (targeted and untargeted) and non-gradient based attacks and compare it to several non-gradient masking defense strategies. Our results demonstrate that the proposed method can increase the resilience of deep convolutional neural networks to adversarial perturbations without accuracy drop on clean data.

1 Introduction

Deep convolutional neural networks (CNNs) are highly vulnerable to adversarial perturbations [14, 43] produced by two main groups of attack mechanisms: white- and black-box, in which the target model parameters and architecture are accessible and hidden, respectively. In order to mitigate the effect of adversarial attacks, two categories of defense techniques have been proposed: data-level and algorithmic-level. Data-level methods include adversarial training [14, 43], pre-/post-processing methods (e.g., feature squeezing) [55], pre-processing using basis functions [39], and noise removal [16, 28]. Algorithmic-level methods [12, 18, 37] modify the deep model or the training algorithm by reducing the magnitude of gradients [32] or blocking/masking gradients [3, 15, 41]. However, these approaches are not completely effective against several different white- and black-box attacks [28, 37, 47] and pre-processing based methods might sacrifice accuracy to gain resilience to attacks that may never happen. Generally, most of these defense strategies cause a drop in the standard accuracy on clean data [48]. For more details on adversarial attacks and defenses, we refer the readers to [56].

Gradient masking has shown sub-optimal performance against different types of adversarial attacks [31, 47]. Athalye et al. [2] identified obfuscated gradients, a special case of gradient masking that leads to a false sense of security in defenses against adversarial perturbations. They showed that 7 out of 9 recent white-box defenses relying on this phenomenon ([3, 8, 15, 25, 37, 41, 53]) are vulnerable to single step or non-gradient based attacks. They finally suggested several symptoms of defenses that rely on obfuscated gradients. Although adversarial training [27] showed reasonable robustness against first order adversaries, this method has two major drawbacks. First, in practice the attack strategy is unknown and it is difficult to choose appropriate adversarial images for training, and second, the method requires more training time [1]. Furthermore, recent studies [23, 40] show that the adversarial training overfits the $L_{\infty}$ metric while remaining highly susceptible to $L_{0}$ , $L_{1}$ , and $L_{2}$ perturbations.

As explored in the literature, successful adversarial attacks are mainly the result of models being too linear [13, 14, 24, 38] in high dimensional manifolds causing the decision boundary to be close to the manifold of the training data [45] and/or because of the models’ low flexibility [10]. Another hypothesis is that adversarial examples are off the data manifold [22, 37, 41]. To boost the non-linearity of a model, Goodfellow et al. [14] explored a variety of methods involving utilizing quadratic units and including shallow and deep radial basis function (RBF) networks. They achieved reasonably good performance against adversarial perturbations with shallow RBF networks. However, they found it difficult to train deep RBF models, leading to a high training error using stochastic gradient decent. Fawzi et al. [11] showed that support vector machines with RBF kernels can effectively resist adversarial attacks.

Typically, a single RBF network layer takes a vector of $\mathbf{x}\in\mathbb{R}^{n}$ as input and outputs a scalar function of the input vector $f(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R}$ computed as $f\left(\mathbf{x}\right)=\sum_{i=1}^{P}w_{i}e^{-\beta_{i}D(\mathbf{x},\mathbf{c}_{i})}$ , where $P$ is the number of neurons in the hidden layer, $\mathbf{c}_{i}$ is the center vector for neuron $i$ , $D(\mathbf{x},\mathbf{c}_{i})$ measures the distance between $\mathbf{x}$ and $\mathbf{c}_{i}$ , $w_{i}$ weights the output of neuron $i$ , and $\beta_{i}$ corresponds to the width of the Gaussian. The input can be made linearly separable with a high probability by transforming it to a higher dimensional space (Cover’s theorem [7])). The Gaussian basis functions, commonly used in RBF networks, are local to the center vectors, i.e., $\lim_{\left\|x\right\|\rightarrow\infty}D(\mathbf{x},\mathbf{c}_{i})=0$ , which in turn implies that a small perturbation $\epsilon$ added to a sample input $\mathbf{x}$ of the neuron has an increasingly smaller effect on the output response as $\mathbf{x}$ becomes farther from the center of that neuron. Traditional RBF networks are normally trained in two sequential steps. First, an unsupervised method, e.g., K-means clustering, is applied to find the RBF centers [52] and, second, a linear model with coefficients $w_{i}$ is fit to the desired outputs.

To tackle the linearity issue of the current deep CNN models (i.e., which are persistent despite stacking several linear units and using non-linear activation functions [13, 14, 24, 38]), which results in vulnerability to adversarial perturbations, we equip CNNs with radial basis mapping kernels. Radial basis functions with Euclidean distance might not be effective as the activation of each neuron depends only on the Euclidean distance between a pattern and the neuron center. Also, since the activation function is constrained to be symmetrical, all attributes are considered equally relevant. To address these limitations, we can add flexibility to the model by applying asymmetrical quadratic distance functions, such as the Mahalanobis distance, to the activation function in order to take into account the variability of the attributes and their correlations. However, computing the Mahalanobis distance requires complete knowledge of the attributes, which are not readily available in dynamic environments (e.g., features iteratively updated during training CNNs), and are not optimized to maximize the accuracy of the model.

Contributions. In this paper, (I) we propose a new non-linear kernel based Mahalanobis distance-like feature transformation method. (II) Unlike the traditional Mahalanobis formulation, in which a constant, pre-defined covariance matrix is adopted, we propose to learn such a “transformation” matrix $\Psi$ . (III) We propose to learn the RBF centers and the Gaussian widths in our RBF transformers. (IV) Our method adds robustness to attacks without reducing the model’s performance on clean data, which is not the case for previously proposed defenses. (V) We propose a defense mechanism which can be applied to various tasks, e.g., classification, segmentation (both evaluated in this paper), and object detection.

2 Method

In this section, we introduce a novel asymmetrical quadratic distance function (2.1), and then propose a manifold mapping method using the proposed distance function (2.2). Then we discuss the model optimization and manifold parameter learning algorithm (2.3).

2.1 Adaptive kernelized distance calculation

Given a convolutional feature map $f^{(l)}\in\mathcal{F}^{nm}$ of size $n\times m\times K^{(l)}$ for layer $l$ of a CNN, the goal is to map the features onto a new manifold $g^{(l)}\in\mathcal{G}^{nm}$ of size $n\times m\times P^{(l)}$ where classes are more likely to be linearly separable. Towards this end, we leverage an RBF transformer that takes feature vectors of $f^{(l)}_{K^{(l)}}$ as input and maps them onto a linearly separable manifold by learning a transformation matrix $\Psi^{(l)}\in\mathbb{R}^{K^{(l)}\times P^{(l)}}$ and a non-linear kernel $\kappa:\mathcal{F}\times\mathcal{F}\rightarrow\mathbb{R}$ for which there exists a representation manifold $\mathcal{G}$ and a map $\phi:\mathcal{F}\rightarrow\mathcal{G}$ such that

[TABLE]

The $k^{th}$ RBF neuron activation function is given by

[TABLE]

where $c^{(l)}_{k}$ is $k^{th}$ learnable center, and the learnable parameter $\beta^{(l)}_{k}$ controls the width of the $k^{th}$ Gaussian function. We use a distance metric $D(.)$ inspired by the Mahalanobis distance, which refers to a distance between a convolutional feature vector and a center, and is computed as:

[TABLE]

where $\Psi^{(l)}$ refers to the learnable transformation matrix of size $K^{(l)}\times P^{(l)}$ . To ensure that the transformation matrix $\Psi^{(l)}$ is positive semi-definite, we set $\Psi^{(l)}=AA^{T}$ and optimize for $A$ .

Finally, the transformed feature vector $g^{(l)}(f^{(l)}_{K^{(l)}})$ is computed as:

[TABLE]

2.2 The proposed manifold mapping

For any intermediate layer $l$ in the network, we compute the transformed feature vector when using the RBF transformation matrix as

[TABLE]

where $f^{(l)}$ is a function of the CNN parameters $\Theta$ .

The detailed diagram of the mapping step is shown in Figure 1. In each layer $l$ of network, output feature maps of the convolutional block are concatenated to the transformed feature maps (Figure 2).

For the last layer (layer $L$ ) of the network, as shown in the green bounding box in Figure 2, the expression becomes

[TABLE]

The concatenation output of the last layer is given by

[TABLE]

2.3 Model optimization and mapping parameter learning

All the RBF parameters, i.e., the transformation matrix $\Psi^{(l)}$ , the RBF centers $C^{(l)}=\{c^{(l)}_{1},c^{(l)}_{2},\cdots,c^{(l)}_{N^{(l)}}\}$ (where $N^{(l)}$ denotes the number of RBF centers in layer $l$ ), and the widths of the Gaussians $\beta_{i}^{(l)}$ , along with all the CNN parameters $\Theta$ , are learned end-to-end using back-propagation. This approach results in the local RBF centers being adjusted optimally as they are updated based on the whole network parameters.

The categorical cross entropy loss is calculated using the softmax activation applied to the concatenation output of the last layer. The softmax activation function for the $i^{th}$ class, denoted by $\xi(y)_{i}$ , is defined as

[TABLE]

where $Q$ represents the total number of classes. The loss is therefore defined by

[TABLE]

Since the multi-class classification labels are one-hot encoded, the expression for the loss contains only the element of the target vector $\mathbf{t}$ which is not zero, i.e., $t_{p}$ , where $p$ denotes the positive class. This expression can be simplified by discarding the summation elements which are zero because of the target labels, and we get

[TABLE]

where $y_{p}$ represents the network output for the positive class.

Therefore, we define a loss function $\mathcal{L}$ encoding the classification error in the transformed manifold and seek $A^{*}$ , $\beta^{*}$ , $C^{*}$ , and $\Theta^{*}$ that minimize $\mathcal{L}$ :

[TABLE]

3 Data

We conduct two sets of experiments: image (i) classification and (ii) segmentation. (i) For the image classification experiments, we use the MNIST [21] and the NIH Chest X-Ray-14 [51] (hereafter referred to as CHEST) datasets, where the latter comprises of 112,120 gray-scale images with 14 disease labels and one ‘no (clinical) finding’ label. We treat all the disease classes as positive and formulate a binary classification task. We randomly selected 90,000 images for training: 45,000 images with “positive” label and the remaining 45,000 with “negative” label. The validation set comprised of 13,332 images with 6,666 images of each label. We randomly picked 200 unseen images as the test set, with 93 images of positive and 107 images of negative class, respectively. These clean (test) images are used for carrying out different adversarial attacks and the models trained on clean images are evaluated against them. (ii) For the image segmentation task experiments, we use the 2D RGB skin lesion dataset from the 2017 IEEE ISBI International Skin Imaging Collaboration (ISIC) Challenge [6] (hereafter referred to as SKIN). We trained on a set of 2,000 images and test on an unseen set of 150 images.

4 Experiments and results

In this section, we report the results of several experiments for two tasks of classification and segmentation. We first start with MNIST as it has extensively been used for evaluating adversarial attacks and defenses. Next, we show how the proposed method is applicable to another classification dataset and segmentation task.

4.1 Evaluation on classification task

In this section, we analyze the performance of the proposed method on two different classifications datasets MNIST and CHEST. In Table 1, we report the results of the proposed method on MNIST dataset when attacked by different targeted and un-targeted attacks i.e., fast gradient sign method (FGSM) [19], basic iterative method (BIM) [19], projected gradient descent (PGD) [27], Carlini & Wagner method (C&W) [5], and momentum iterative method (MIM) [9] (the winner of NIPS 2017 adversarial attacks competition). The proposed method (i.e., PROP) successfully resists all the attacks (with both $L_{\infty}$ and $L_{2}$ perturbations) for which the 3-layers CNN (i.e., ORIG) network almost completely fails e.g., for the strongest attack (i.e., MIM) the proposed method achieves $64.25\%$ accuracy while the original CNN network obtains almost zero ( $0.58\%$ ) accuracy. Further, we test the proposed method with two non-gradient based attacks: simultaneous perturbation stochastic approximation (SPSA) [49] and Gaussian additive noise (GN) [33] to show that the robustness of the proposed method is not because of gradient masking. We compare our results to other defenses e.g., Binary CNN, Nearest Neighbour (NN) model, Analysis by Synthesis (ABS), Binary ABS [23] and Fortified Networks [20]. Looking at the Binary ABS and ABS results in Table 1, the former generally outperforms the latter, but it should be noted that Binary ABS is applicable only to simple datasets, e.g., MNIST, as it leverages binarization. Although Fortified Net outperforms our method for the FGSM attack, it has been tested only on gradient-based attacks, and therefore, it is unclear how it would perform against gradient-free attacks such as SPSA and GN.

To ensure that the robustness of the proposed method is not due to masked/obfuscated gradient, as suggested by [2], we test the proposed feature mapping method based on several characteristic behaviors of defenses that cause obfuscated gradients to occur. a) As reported in Table 1, one-step attacks (e.g., FGSM) did not perform better than iterative attacks (e.g., BIM, MIM); b) According to Tables LABEL:table1 and 5, black-box attacks did not perform better than white-box ones; c) as shown in Figure 3(a) (a and b), larger distortion factors monotonically increase the attack success rate; d) the proposed method performs well against gradient-free attacks e.g., GN and SPSA. The subplot (b) in Figure 3(a) also indicates that the robustness of the proposed method is not because of numerical instability of gradients.

Next, to quantify the compactness and separability of different clusters/classes, we evaluate the features produced ORIG and PROP methods with clustering evaluation techniques such as mutual information based score [50], homogeneity and completeness [35], Silhouette coefficient [36], and Calinski-Harabaz index [4]. Both Silhouette coefficient and Calinski-Harabaz index quantify how well clusters are separated from each other and how compact they are without taking into account the ground truth labels, while mutual information based score, homogeneity, and completeness scores evaluate clusters based on labels. As reported in Table 2, when the original CNN network applies radial basis feature mapping it achieves considerably higher scores (for all the metrics, higher values are better). As both the original and the proposed method achieved high classification test accuracy i.e., $\sim 99\%$ , the difference in scores for label based metrics, i.e., mutual information based, homogeneity, and completeness, scores are small.

In Figure 4, we visualize the feature spaces of each layer in a simple 3-layer CNN using t-SNE [26] and PCA [17] methods by reducing the high dimensional feature space into two dimensions. As can be seen, the proposed radial basis feature mapping reduces intra-class and increases inter-class distances.

To test the robustness of the proposed method on CHEST, we follow the strategy of Taghanaki et al. [44]. We select Inception-ResNet-v2 [42] and modify it by the proposed radial basis function blocks. According to the study done by Taghanaki et al. [44], we focus on the most effective attacks in term of imperceptibility and power i.e., gradient-based attacks (basic iterative method [19]: BIM and L1-BIM). We also compare the proposed method with two defense strategies: Gaussian data augmentation (GDA) [57] and feature squeezing (FSM) [55]. GDA is a data augmentation technique that augments a dataset with copies of the original samples to which Gaussian noise has been added. FSM method reduces the precision of the components of the original input by encoding them with a smaller number of bits. In Table 3, we report the classification accuracy of different attacks and defenses (including PROP) on CHEST.

4.2 Evaluation on segmentation task

To assess the segmentation vulnerability to adversarial attacks, we apply the dense adversary generation (DAG) method proposed by Xie et al. [54] to two state-of-the-art segmentation networks: U-Net [34] and V-Net [29] under both white- and black-box conditions. We compare the proposed feature mapping method to other defense strategies e.g., Gaussian and median feature squeezing [55] (FSG and FSM, respectively) and adversarial training [14] (ADVT) on SKIN. From Table LABEL:table1, it can be seen that the proposed method is more robust to adversarial attacks and when applied to U-Net, its performance deteriorates much lesser than the next best method (ADVT) 1.60% and 6.39% vs 9.44% and 13.47% after 10 and 30 iterations of the attack with $\gamma=0.03$ . Similar performance was also observed for V-Net, where the accuracy drop of the proposed method using the same $\gamma$ for 10 iterations was 8.50%, while the next best method (ADVT) dropped by 11.76%. It should also be noted that applying the feature mapping led to an improvement in the segmentation accuracy on clean (non-attacked/unperturbed) images, and the performance increased to 0.7780 from the original 0.7743 for U-Net, and to 0.8213 from the original 0.8070 for V-Net.

Figure 5 visualizes the segmentation results of a few samples from SKIN for different defense strategies. As shown, the proposed method obtains the closest results to the ISIC ground truth (GT) than all other methods. Although adversarial training (ADVT) also produces promising segmentation results, it requires knowledge of the adversary in order to perform robust optimization which is almost impossible to obtain in practice since the attack is unknown. However, the proposed method does not have such a dependency.

As reported in Table 5, under black-box attack, the proposed method is the best performing method across all 12 experiments except for one in which the accuracy of the best method was just $0.0022$ higher (i.e., $0.7284\pm 0.2682$ vs $0.7262\pm 0.2621$ ). However, it should be noted that the standard deviation of the winner is larger than the proposed method.

Next, we analyze the usefulness of learning the transformation matrix ( $\Psi$ ) and width of the Gaussian ( $\beta$ ) in our Mahalanobis-like distance calculation. As can be seen in Table 6, in all the cases, i.e., testing with clean images and images 10 and 30 iterations of attack, our method with $\Psi$ and $\beta$ achieved higher performance.

Figure 6 shows how the $\Psi$ of a single layer converges after a few epochs on the MNIST dataset. The Y-axis is the Frobenius norm of the change in $\Psi$ between two consecutive epochs. Adopting the value $\Psi$ converges to results in a superior performance (e.g., mean Dice 0.8213 vs 0.7721, as reported in Table 6) compared to when we do not optimize for $\Psi$ , i.e., hold it constant (identity matrix).

5 Implementation details

5.1 Classification experiments

$\bullet$ MNIST experiments: For both the original CNN with 3-layers (i.e., ORIG) and the proposed method (i.e., PROP), we used a batch size of 128 and Adam optimizer with learning rate of 0.001.

$\bullet$ CHEST: Inception-ResNet-v2 network was trained with a batch size of 4 with RMSProp optimizer [46] with a decay of 0.9 and $\epsilon=1$ and an initial learning rate of 0.045, decayed every 2 epochs using an exponential rate of 0.94.

For all the gradient based attacks applied in the classification part, we used the CleverHans library [30], and for the Gaussian additive noise attack, we used FoolBox [33].

5.2 Segmentation experiments

For both U-Net and V-Net, we used a batch size of 16, ADADELTA optimizer with learning rate of 1.0, rho = 0.95, and decay = 0.0. We tested the DAG method with 10 and 30 iterations and perturbation factor $\gamma=0.03$ . For FSM and FSG defenses, we applied a window size of $3\times 3$ and a standard deviation of 1.0, respectively.

6 Conclusion

We proposed a nonlinear radial basis feature mapping method to transform layer-wise convolutional features into a new manifold, where improved class separation reduces the effectiveness of perturbations when attempting to fool the model. We evaluated the model under white- and black-box attacks for two different tasks of image classification and segmentation and compared our method to other non-gradient based defenses. We also performed several tests to ensure that the robustness of the proposed method is neither because of numerical instability of gradients nor because of gradient masking. In contrast to previous methods, our proposed feature mapping improved the classification and segmentation accuracy on both clean and perturbed images.

Acknowledgement

Partial funding for this project is provided by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors are grateful to the NVIDIA Corporation for donating a Titan X GPU used in this research.

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. R. Adnan, H. Zhezhi, G. Boqing, and F. Deliang. Blind pre-processing: A robust defense method against adversarial examples. ar Xiv preprint ar Xiv:1802.01549 , 2018.
2[2] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420 , 2018.
3[3] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. International Conference on Learning Representations , 2018.
4[4] T. Caliński and J. Harabasz. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods , 3(1):1–27, 1974.
5[5] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP) , pages 39–57. IEEE, 2017.
6[6] N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi). the International Skin Imaging Collaboration (ISIC). ar Xiv preprint ar Xiv:1710.05006 , 2017.
7[7] T. M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers , EC-14(3):326–334, June 1965.
8[8] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, J. Bernstein, J. Kossaifi, A. Khanna, and A. Anandkumar. Stochastic activation pruning for robust adversarial defense. ar Xiv preprint ar Xiv:1803.01442 , 2018.