Adversarial Example Decomposition

Horace He; Aaron Lou; Qingxuan Jiang; Isay Katsman; Serge Belongie,; Ser-Nam Lim

arXiv:1812.01198·stat.ML·June 24, 2019

Adversarial Example Decomposition

Horace He, Aaron Lou, Qingxuan Jiang, Isay Katsman, Serge Belongie,, Ser-Nam Lim

PDF

Open Access

TL;DR

This paper introduces a method to decompose adversarial examples into architecture, data, and noise components, revealing their transferability properties and enabling improved adversarial transferability.

Contribution

It proposes a novel decomposition of adversarial examples into three bias sources, enhancing understanding and transferability of adversarial attacks.

Findings

01

Noise-dependent components transfer poorly across models.

02

Architecture-dependent components transfer better among same-architecture models.

03

Recombining components improves transferability without losing original efficacy.

Abstract

Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.

Tables7

Table 1. Table 1: Δ x n o i s e Δ subscript 𝑥 𝑛 𝑜 𝑖 𝑠 𝑒 \Delta x_{noise} decomposition (ResNet18)

$Δ$	$ℳ_{o r i g}$	${ℳ_{a v g}}$	${ℳ_{t e s t}}$
$Δ x$	68.3%	45.6%	46.7%
$Δ x_{n r}$	63.7%	61.9%	59.5%
$Δ x_{n o i s e}$	60.2%	19.8 %	20.3%

Table 2. Table 3: Linear Combinations of Δ x ^ n o i s e Δ subscript ^ 𝑥 𝑛 𝑜 𝑖 𝑠 𝑒 \Delta\widehat{x}_{noise} and Δ x ^ n r Δ subscript ^ 𝑥 𝑛 𝑟 \Delta\widehat{x}_{nr}

$b : a$	$ℳ_{o r i g}$	${ℳ_{a v g}}$	$ℳ_{t e s t}$
$Δ x_{n r}$	65.8%	63.6%	65.1%
2:1	68.5%	63.7%	65.2%
1.5:1	69.4%	61.2%	62.8%
1:1	69.8%	56.0%	56.4%
1:2	70.0%	53.1%	53.5%
$Δ x$	69.8%	51.0%	51.0%

Table 3. Table 4: Δ x a r c h Δ subscript 𝑥 𝑎 𝑟 𝑐 ℎ \Delta x_{arch} decomposition

source	$Δ$	{ $ℳ_{s o u r c e}}$	${ℳ_{o t h e r}}$	${ℳ^{'}_{s o u r c e}}$	${ℳ^{'}_{o t h e r}}$
	$Δ x_{n r}$	60.9%	50.7%	59.4%	50.7%
ResNet18	$Δ x_{d a t a}$	54.6%	61.4%	54.8%	60.8%
	$Δ x_{a r c h}$	52.4%	26.7%	36.9%	30.3%
	$Δ x_{n r}$	62.8%	46.2%	62.9%	47.1%
DenseNet121	$Δ x_{d a t a}$	58.4%	58.3%	57.2%	55.7%
	$Δ x_{a r c h}$	54.1%	24.4%	43.1%	26.0%
	$Δ x_{n r}$	65.3%	41.9%	65.7%	41.9%
GoogLeNet	$Δ x_{d a t a}$	59.5%	59.2%	59.5%	58.3%
	$Δ x_{a r c h}$	57.9%	22.8%	44.8%	26.2%
	$Δ x_{n r}$	53.8%	48.4%	53.2%	49.0%
SENet18	$Δ x_{d a t a}$	55.7%	64.5%	54.8%	63.8%
	$Δ x_{a r c h}$	47.1%	28.1%	38.6%	29.8%

Table 4. Table 5: Varying α 𝛼 \alpha

$α$	$ℳ_{o r i g}$	${ℳ_{a v g}}$	Difference
$Δ x$	68.3%	45.6%	22.7
$Δ x_{n r}$	63.7%	61.9%	1.8
0.1	66.7%	42.2%	24.5
0.5	63.4%	29.1%	34.3
0.8	59.7%	21.4%	38.3
1.0	52.7%	16.6%	36.1
1.2	51.9%	11.3%	40.6
1.5	42.9%	9.4%	33.5
2.0	33.5%	7.2%	26.3

Table 5. Table 6: Varying ϵ italic-ϵ \epsilon

$ϵ$	$Δ$	{ $ℳ_{o r i g}}$	${ℳ_{a v g}}$	${ℳ_{t e s t}}$
	$Δ x$	39.0%	16.4%	14.4%
.01	$Δ x_{n r}$	25.1%	28.4%	22.2%
	$Δ x_{n o i s e}$	26.6%	06.2%	05.3%
	$Δ x$	68.3%	45.6%	46.7%
.03	$Δ x_{n r}$	63.7%	61.9%	59.5%
	$Δ x_{n o i s e}$	60.2%	19.8%	20.3%
	$Δ x$	81.2%	69.7%	73.6%
.06	$Δ x_{n r}$	81.1%	80.5%	85.8%
	$Δ x_{n o i s e}$	77.7%	39.4%	40.0%

Table 6. Table 7: Varying number of iterations used for iFGSM

# of iters	$Δ$	{ $ℳ_{o r i g}}$	${ℳ_{a v g}}$
	$Δ x$	65.2%	43.4%
5	$Δ x_{n r}$	58.8%	58.8%
	$Δ x_{n o i s e}$	55.5%	20.6%
	$Δ x$	68.3%	46.7%
10	$Δ x_{n r}$	63.7%	61.9%
	$Δ x_{n o i s e}$	60.2%	19.8%
	$Δ x$	72.9%	48.6%
100	$Δ x_{n r}$	67.3%	65.2%
	$Δ x_{n o i s e}$	60.3%	18.7%

Table 7. Table 8: Varying number of models used to approximate Δ x n r Δ subscript 𝑥 𝑛 𝑟 \Delta x_{nr}

# of models	$Δ$	{ $ℳ_{o r i g}}$	${ℳ_{a v g}}$	${ℳ_{t e s t}}$
	$Δ x$	69.4%	46.6%	45.6%
3	$Δ x_{n r}$	57.6%	62.1%	51.9%
	$Δ x_{n o i s e}$	60.1%	24.9%	29.2%
	$Δ x$	68.4%	47.0%	44.8%
5	$Δ x_{n r}$	60.1%	62.0%	55.2%
	$Δ x_{n o i s e}$	57.5%	22.4%	24.6%
	$Δ x$	68.3%	45.6%	46.7%
10	$Δ x_{n r}$	63.7%	61.9%	59.5%
	$Δ x_{n o i s e}$	60.2%	19.8%	20.3%

Equations28

Δ x_{n r} \approx A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}, L, 0) Δ x_{n o i se} \approx Δ x - P_{Δ x_{n r}} (Δ x)

Δ x_{n r} \approx A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}, L, 0) Δ x_{n o i se} \approx Δ x - P_{Δ x_{n r}} (Δ x)

Δ x_{d a t a} \approx E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, Δ x_{n r})] Δ x_{a r c h} \approx Δ x_{n r} - P_{Δ x_{d a t a}} (Δ x_{n r})

Δ x_{d a t a} \approx E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, Δ x_{n r})] Δ x_{a r c h} \approx Δ x_{n r} - P_{Δ x_{d a t a}} (Δ x_{n r})

Δ x_{n r} \approx A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}, L, 0) Δ x_{n o i se} \approx Δ x - P_{Δ x_{n r}} (Δ x)

Δ x_{n r} \approx A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}, L, 0) Δ x_{n o i se} \approx Δ x - P_{Δ x_{n r}} (Δ x)

Δ x^{j} = Δ x_{n o i se}^{j} + Δ x_{a r c h} + Δ x_{d a t a}

Δ x^{j} = Δ x_{n o i se}^{j} + Δ x_{a r c h} + Δ x_{d a t a}

E_{j} [Δ x^{j}] = Δ x_{a r c h} + Δ x_{d a t a} + E_{j} [Δ x_{n o i se}^{j}] = Δ x_{a r c h} + Δ x_{d a t a} = Δ x_{n r}

E_{j} [Δ x^{j}] = Δ x_{a r c h} + Δ x_{d a t a} + E_{j} [Δ x_{n o i se}^{j}] = Δ x_{a r c h} + Δ x_{d a t a} = Δ x_{n r}

\frac{1}{n} j = 1 \sum n Δ x^{j} \approx Δ x_{n r}

\frac{1}{n} j = 1 \sum n Δ x^{j} \approx Δ x_{n r}

L (\frac{1}{n - 1} j = 2 \sum n M_{j} (x), y) = \frac{1}{n - 1} j = 2 \sum n L (M_{j} (x), y) ⟹ A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}, L, 0) = \frac{1}{n - 1} j = 2 \sum n A (x, y, M_{j}, L, 0) \approx Δ x_{n r}

L (\frac{1}{n - 1} j = 2 \sum n M_{j} (x), y) = \frac{1}{n - 1} j = 2 \sum n L (M_{j} (x), y) ⟹ A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}, L, 0) = \frac{1}{n - 1} j = 2 \sum n A (x, y, M_{j}, L, 0) \approx Δ x_{n r}

Δ x_{n o i se} + P_{Δ x_{n r}} (Δ x) \approx Δ x ⟹ Δ x_{n o i se} \approx Δ x - P_{Δ x_{n r}} (Δ x)

Δ x_{n o i se} + P_{Δ x_{n r}} (Δ x) \approx Δ x ⟹ Δ x_{n o i se} \approx Δ x - P_{Δ x_{n r}} (Δ x)

Δ x_{d a t a} \approx E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, Δ x_{n r})] Δ x_{a r c h} \approx Δ x_{n r} - P_{Δ x_{d a t a}} (Δ x_{n r})

Δ x_{d a t a} \approx E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, Δ x_{n r})] Δ x_{a r c h} \approx Δ x_{n r} - P_{Δ x_{d a t a}} (Δ x_{n r})

E_{A} [Δ x_{n r}] = E_{A} [Δ x_{a r c h}] + E_{A} [Δ x_{d a t a}] = E_{A} [Δ x_{d a t a}]

E_{A} [Δ x_{n r}] = E_{A} [Δ x_{a r c h}] + E_{A} [Δ x_{d a t a}] = E_{A} [Δ x_{d a t a}]

\frac{1}{n} i = 1 \sum n Δ x_{n r}^{i} = Δ x_{d a t a}

\frac{1}{n} i = 1 \sum n Δ x_{n r}^{i} = Δ x_{d a t a}

Δ x_{d a t a} = E_{A_{i} \in A} [Δ x_{n r}^{i}] = E_{A_{i} \in A} E_{j} [(x, y, M_{j}^{i}, L, 0)] \approx E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, 0)]

Δ x_{d a t a} = E_{A_{i} \in A} [Δ x_{n r}^{i}] = E_{A_{i} \in A} E_{j} [(x, y, M_{j}^{i}, L, 0)] \approx E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, 0)]

E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, Δ x_{n r})]

E_{A_{i} \in A} [A (x, y, \frac{1}{n - 1} j = 2 \sum n M_{j}^{i}, L, Δ x_{n r})]

Δ x_{d a t a} + P_{Δ x_{d a t a}} (Δ x_{n r}) \approx Δ x_{n r} ⟹ Δ x_{a r c h} \approx Δ x - P_{Δ x_{d a t a}} (Δ x_{n r})

Δ x_{d a t a} + P_{Δ x_{d a t a}} (Δ x_{n r}) \approx Δ x_{n r} ⟹ Δ x_{a r c h} \approx Δ x - P_{Δ x_{d a t a}} (Δ x_{n r})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security

Full text

Adversarial Example Decomposition

Horace He

Aaron Lou

Qingxuan Jiang

Isay Katsman

Serge Belongie

Ser-Nam Lim

Abstract

Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.

Machine Learning, ICML

1 Introduction

Due to the recent successes of neural networks on a wide variety of tasks, they are now being widely applied in the real-world. However, despite their major successes, recent works have shown that in the presence of adversarially perturbed input, they fail catastrophically (Szegedy et al., 2013; Goodfellow et al., 2014). Moreover, Szegedy et al. (2013); Goodfellow et al. (2014) showed that inputs adversarially generated for one model often cause other models to misclassify images as well, a phenomenon commonly called transferability.

Our understanding of the causes of transferability is fairly limited. Tramèr et al. (2017) analyzes local similarity of decision boundaries to define a local decision boundary metric that determines how transferable adversarial examples between two models are likely to be. However, many questions are still open. The recent work Wu et al. (2018) hypothesized that adversarial perturbations could be decomposed into initialization-specific and data-dependent components. It is also hypothesized that the data-dependent component is primarily what contributes to transfer. However, Wu et al. (2018) provides neither theoretical nor empirical evidence to justify this hypothesis.

Our work aims to examine this hypothesis in greater detail. We first augment the previous hypothesis to provide decomposition into three parts: architecture-dependent, data-dependent, and noise-dependent components. Given this framework, our contributions are as follows:

•

We propose a method for decomposing adversarial perturbations into noise-dependent and noise-reduced components.

•

We also present a method to further decompose the noise-reduced component into architecture-dependent and data-dependent components.

•

Extensive experiments are conducted on CIFAR-10 (Krizhevsky, 2009) using various architectures to show the above two decompositions have the desired properties. Results from an ablation study are given to show the significance of the nontrivial choices made in our methodology.

2 Motivation and Approach

Motivated by the reviewers’ comments on Wu et al. (2018), we seek to provide further evidence that an adversarial example can be decomposed into model-dependent and data-dependent portions. First, we augment our hypothesis to claim that an adversarial perturbation can be decomposed into architecture-dependent, data-dependent, and noise-dependent components. We note that it is clear that these are the only things that could contribute in some way to the adversarial example. An intuition behind why noise-dependent components exist and would not transfer despite working on the original dataset is shown in Figure 2.

Not drawn explicitly in the figure is the architecture-dependent component. As neural networks induce biases in the decision boundary, and specific network architectures induce specific biases, we would expect that an adversarial example could exploit these biases across all models with the same architecture.

2.1 Notation

We denote $\mathcal{A}=\{\mathcal{A}_{0},\mathcal{A}_{1},\dots,\mathcal{A}_{k}\}$ to be the set of model architectures. Let $\mathcal{M}^{i}=\{\mathcal{M}_{\alpha}^{i}\}$ to be a set of fully trained models of architecture $\mathcal{A}_{i}$ initialized with random noise. The superscript will be omitted when architecture is clear.

We define an attack $A(x,y,\mathcal{M}_{j}^{i},\mathcal{L},\Delta x_{0})=\Delta x$ , where $x$ is an image, $y$ its corresponding label, $\mathcal{M}_{j}^{i}$ is a neural network model as defined above, $\mathcal{L}$ is a loss function, $\Delta x_{0}$ the initial perturbation of $x$ , and $\Delta x$ a perturbation of $x$ such that $\mathcal{L}(\mathcal{M}_{j}^{i}(x+\Delta x),y)$ is maximal.

For fixed architecture $\mathcal{A}_{i}$ , model $\mathcal{M}_{j}^{i}$ , and attack $A$ , we denote $\Delta x_{noise},\Delta x_{arch},\Delta x_{data}$ to be the three components of $\Delta x$ introduced in previous sections. Let $\Delta x_{noise\ reduced}=\Delta x_{arch}+\Delta x_{data}$ ; we will use the short hand $\Delta x_{nr}$ .

Let $P_{x_{1}}(x_{2})$ denotes the projection of vector $x_{2}$ onto vector $x_{1}$ . Let $\widehat{x}=\frac{x}{||x||}$ be the unit vector with same direction as $x$ .

2.2 $\Delta x_{noise}$ and $\Delta x_{nr}$ Decomposition

**Description: ** We fix our architecture $\mathcal{A}_{0}$ and have $\{\mathcal{M}_{1},\dots,\mathcal{M}_{n}\}$ as our set of trained models. Set $\mathcal{L}$ to be the cross-entropy loss and let $\Delta x=A(x,y,\mathcal{M}_{1},\mathcal{L},\mathbf{0})$ be the generated adversarial perturbation for $\mathcal{M}_{1}$ .

**Proposition: ** $\Delta x$ can be decomposed into $\Delta x_{noise}+\Delta x_{nr}$ such that the attack $\Delta x_{noise}$ is effective on $\mathcal{M}_{1}$ but transfers poorly to $\mathcal{M}_{2},\dots,\mathcal{M}_{n}$ , while $\Delta x_{nr}$ transfers well on all models.

The equations for computing $\Delta x_{nr}$ and $\Delta x_{noise}$ are given in Equation 1 (see Appendix C for justification). The technique is illustrated in Figure 1.

[TABLE]

2.3 $\Delta x_{arch}$ and $\Delta x_{data}$ Decomposition

**Description: ** We reuse notation from the above section, except that we now consider a set of different architectures $\mathcal{A}=\{\mathcal{A}_{0},\mathcal{A}_{1},\dots\mathcal{A}_{k}\}$

Proposition: $\Delta x_{nr}$ can be composed into $\Delta x_{arch}+\Delta x_{data}$ such that the attack $\Delta x_{arch}$ is effective on $\mathcal{A}_{0}$ but transfers poorly to $\mathcal{A}_{1},\cdots,\mathcal{A}_{k}$ , while $\Delta x_{data}$ transfers well on all models.

The equations for computing $\Delta x_{arch}$ and $\Delta x_{data}$ are given in Equation 2, in which we set $\Delta x_{nr}$ to be the noise reduced perturbation generated on $\mathcal{A}_{0}$ (see Appendix C for justification). We approximate the expectation for $\Delta x_{data}$ by averaging across architectures.

[TABLE]

3 Results

We empirically verify the approaches given in the motivation above and show that the isolated noise and architecture-dependent perturbations show the desired properties. Unless stated otherwise, all perturbations are generated on CIFAR-10 (Krizhevsky, 2009) (original images rescaled to $[-1,1]$ ) using iFGSM (Kurakin et al., 2016) with 10 iterations, distance metric $L_{\infty}$ , and $\epsilon=0.03$ . All experiments are run on the first 2000 CIFAR-10 test images. In addition, all models are trained for only 10 epochs due to computational constraints. All percentages reported are fooling ratios (Moosavi-Dezfooli et al., 2017). For results with other settings, check Appendix A.

3.1 $\Delta x_{noise}$ and $\Delta x_{nr}$ Decomposition

We start off with a set of 10 retrained ResNet18 (He et al., 2016) models $\{\mathcal{M}_{i}\}$ . We attack the first ResNet18 model $\mathcal{M}_{1}$ ( $=\mathcal{M}_{orig}$ ) to get a perturbation $\Delta x$ for a given $x$ . We then follow the process outlined in Equation 1 to obtain $\Delta x_{nr}$ from the other 9 retrained ResNet18 models $\{\mathcal{M}_{i>1}\}$ ( $=\{\mathcal{M}_{avg}\}$ ). We then test on an untouched set of 5 retrained ResNet18 models $\{\mathcal{M}_{test}\}$ . We also do the same process for DenseNet121 (Huang et al., 2017) instead of ResNet18 and report their respective results in Tables 1 and 2.

We note that $\Delta x_{noise}$ achieves a far lower transfer rate than either $\Delta x_{nr}$ or $\Delta x$ while still maintaining relatively high error rate on the original model, providing evidence for the success of this decomposition. To the best of our knowledge, this is the first methodology that is able to construct adversarial examples with especially low transferability. Although this is of low practical use, this is theoretically interesting. We note that although we attempt to generate $\Delta x_{nr}$ by multi-fooling across 9 retrained models, reducing noise in high dimensions is difficult, so we are unable to achieve a perfect decomposition of $\Delta x_{noise}$ . Ablation studies in Appendix B suggest that we may be able to achieve a better decomposition with a larger set of retrained models.

3.1.1 Recombining components

As the components $\Delta\widehat{x}_{noise}$ and $\Delta\widehat{x}_{nr}$ are linearly independent unit vectors, and by definition, $\Delta x$ is in the span of these vectors, we can find unique scalars $a$ and $b$ such that $a\cdot\Delta\widehat{x}_{noise}+b\cdot\Delta\widehat{x}_{nr}=\Delta x$ . Experimentally, we find that under our setting, $a\approx 1.319$ and $b\approx 0.386$ . We note that for our original perturbation, this is perhaps an undue amount of focus paid to the noise-specific perturbation. We can now try setting $a$ and $b$ to different ratios, which correspond to how much we wish to emphasize attacking the original model vs. transferability. As we are now able to set an arbitrarily high $a$ and $b$ , allowing us to saturate the epsilon constraints, we sign maximize (ie: $sign(x)\cdot\epsilon$ ), as motivated in Goodfellow et al. (2014)) to level the playing field. The results in Table 3 show the results of performing these experiments on ResNet18. We find that we are able to generate perturbations that perform equivalently with $\Delta x$ on $\mathcal{M}_{orig}$ , while performing substantially better when transferring to $\{\mathcal{M}_{avg}\}$ and $\mathcal{M}_{test}$ .

3.2 $\Delta x_{arch}$ and $\Delta x_{data}$ Decomposition

To evaluate decomposition into architecture and data-specific components, we consider the four architectures ResNet18 (He et al., 2016), GoogLeNet (Szegedy et al., 2015), DenseNet121 (Huang et al., 2017), and SENet18 (Hu et al., 2017). Results are given in Table 4. In each experiment we first fix a source architecture $\mathcal{A}_{i}$ and generate $\Delta x_{nr}$ by attacking 4 retrained copies of $\mathcal{A}_{i}$ , denoted as $\{\mathcal{M}_{source}\}$ . We then generate $\Delta x_{data}$ by attacking four copies of each $\mathcal{A}_{j\neq i}$ for twelve models total. We then test on another 4 retrained copies of $\mathcal{A}_{i}$ called $\{\mathcal{M}^{\prime}_{source}\}$ as well as $\{\mathcal{M}^{\prime}_{other}\}$ , consisting of four copies of each of the other three architectures $\{\mathcal{A}_{j\neq i}\}$ . We see that for all four models, $\Delta x_{arch}$ obtains significantly higher error rate on $\{\mathcal{M}^{\prime}_{source}\}$ than on $\{\mathcal{M}^{\prime}_{other}\}$ . In addition, the relative error between $\{\mathcal{M}^{\prime}_{source}\}$ and $\{\mathcal{M}^{\prime}_{other}\}$ for $\Delta x_{arch}$ are close to the relative error between $\{\mathcal{M}_{source}\}$ and $\{\mathcal{M}_{other}\}$ for $\Delta x_{nr}$ when averaged across models, supporting the success of our decomposition.

3.3 Ablation

Orthogonality We assume that $\Delta x_{noise}$ , $\Delta x_{arch}$ , and $\Delta x_{data}$ terms are orthogonal. We note that if these vectors had no relation to each other, then due to the properties of high dimensional space, they are approximately orthogonal with very high probability.

We vary orthogonality by modifying the method in Section 2.2 to generate $\Delta x_{noise}$ with $\Delta x-\alpha P_{\Delta x_{nr}}(\Delta x)$ . When $\alpha=1$ , we recover the original algorithm, and when $\alpha=0$ , $\Delta x_{noise}=\Delta x$ . Experimentally varying the orthogonality of $\Delta x_{noise}$ and $\Delta x_{nr}$ produces the results in Table 5; note that we achieve the greatest difference in efficacy between the original model and transferred models when they are near-orthogonal, suggesting that the assumption we made is reasonable.

However, it is not true that orthogonal components achieve the best isolation (given by the fact that the peak difference seems to be at $\alpha=1.2$ ). This suggests that our current method of decomposition may simply be an approximation for the true components, and that a more nuanced method may be necessary for better isolation.

Number of Models We find that the higher the number of models we use to approximate $\Delta x_{nr}$ , the more successfully we are able to isolate $\Delta x_{noise}$ . Check Appendix B for full results.

4 Conclusion

We demonstrate that it is possible to decompose adversarial perturbations into noise-dependent and data-dependent components, a hypothesis reviewers thought was interesting but unsupported in (Wu et al., 2018). We go even further by decomposing an adversarial perturbation into model related, data related, and noise related perturbations. A major contribution here is a new method of analyzing adversarial examples; this creates many potential future directions for research. One interesting direction would be extending these decompositions to universal perturbations (Moosavi-Dezfooli et al., 2017; Poursaeed et al., 2017) and thus removing the dependence on individual data points. Another avenue to explore is analyzing various attacks and defenses and how they interplay with these various components.

A. Different attack settings

To show that our decomposition is effective across a variety of attack settings, we perform the experiment of Section 3.1 with three different iFGSM settings corresponding to $\epsilon=0.01,0.03,0.06$ . Results are shown in Table 6.

B. Varying number of models/iterations

We investigate the effectiveness of the Section 3.1 decomposition as we vary hyper-parameters. Results for increasing iFGSM iterations in Table 7 and results for increasing the results for increasing the number of models are give in Table 8.

C. Justification of Equations

Justification of Equations in 3.1

Recall that the equations are given by

[TABLE]

We assume that the expected value of our noise term $\Delta x_{noise}$ is [math] over all random noise. This is motivated because the random noise $i$ at initialization is a Gaussian distribution centered at [math], and it is reasonable to assume that the model distribution and the noise distribution follows a similar pattern.

Letting $\Delta x^{j}=A(x,y,\mathcal{M}_{j},\mathcal{L},0)$ over all random initialization $i$ , we claim that $\mathbb{E}_{j}[\Delta x^{j}]=\Delta x_{arch}+\Delta x_{data}$ . Since $\Delta x_{arch}$ and $\Delta x_{data}$ are noise independent, which means that

[TABLE]

where $\Delta x_{noise}^{j}$ is the noise component corresponding with the noise of model $\mathcal{M}_{j}$ . Therefore, it follows that

[TABLE]

By the law of large numbers, it follows that $\displaystyle{\lim_{n\to\infty}\frac{1}{n}\sum_{j=1}^{n}}\Delta x^{j}=\Delta x_{nr}$ . Therefore, we note that, for sufficiently large $n$ , it follows that

[TABLE]

We see that, since the cross entropy loss $\mathcal{L}$ is additive and the attack $A$ that we examine are first order differentiation methods, we have

[TABLE]

To prove the other claim, we have already shown through empirical results and an intuition that $\Delta x_{noise}$ and $\Delta x_{nr}$ are linearly independent that $\Delta x_{noise}$ and $\Delta x_{nr}$ are very close to orthogonal and compose $\Delta x$ . Therefore, it follows that we can take the use the projection of $\Delta x_{nr}$ implies that

[TABLE]

up to a scaling constant.

Justification of Equations in 3.2

Recall that the equations are, given $\Delta_{nr}$ generated on $\mathcal{A}_{0}$ ,

[TABLE]

We make two core assumptions:

•

The value of $\Delta\mathbb{E}_{\mathcal{A}}[x_{arch}]=0$ . This is a reasonable assumption since our generated architectures $\mathcal{A}$ should produce roughly symmetric error vectors $x_{arch}$ .

•

$A(x,y,\frac{1}{n-1}\sum_{j=2}^{n}\mathcal{M},\mathcal{L},\Delta x^{\prime})$ is equivalent $A(x,y,\mathcal{M},\mathcal{L},0)$ in the sense that the former produces a noised reduce gradient closer to $\Delta x^{\prime}$ . This is reasonable because the space of there are many adversarial perturbations (different directions) and changing our start location won’t cripple our search space. Furthermore, we use this to generate a $\Delta_{nr}$ close to $\Delta x^{\prime}$ .

We claim that $\mathbb{E}_{\mathcal{A}}[\Delta x_{nr}]=\Delta x_{data}$ where we take $\Delta x_{nr}$ over architecture $\mathcal{A}$ . To see this, we note that

[TABLE]

and so again we can approximate it with $\displaystyle{\lim_{n\to\infty}}\frac{1}{n}\sum_{i=1}^{n}\Delta x_{nr}^{i}=\Delta x_{data}$ where $\Delta x_{nr}^{i}$ is the $\Delta x_{nr}$ component generated for model $\mathcal{A}_{i}$ . For sufficiently large $n$ , it follows that

[TABLE]

Therefore we have

[TABLE]

and by our assumption this is roughly equivalent to

[TABLE]

as desired. To prove the other claim, we use an analogous argument to the one above as we have shown that $\Delta x_{arch}$ and $\Delta x_{data}$ are orthogonal and applying the same projection technique yields

[TABLE]

up to a scaling constant.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Goodfellow et al. (2014) Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. Co RR , abs/1412.6572, 2014.
2He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, 2016.
3Hu et al. (2017) Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. Co RR , abs/1709.01507, 2017.
4Huang et al. (2017) Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 2261–2269, 2017.
5Krizhevsky (2009) Krizhevsky, A. Learning multiple layers of features from tiny images. 2009.
6Kurakin et al. (2016) Kurakin, A., Goodfellow, I. J., and Bengio, S. Adversarial examples in the physical world. Co RR , abs/1607.02533, 2016.
7Moosavi-Dezfooli et al. (2017) Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., and Frossard, P. Universal adversarial perturbations. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 86–94, 2017.
8Poursaeed et al. (2017) Poursaeed, O., Katsman, I., Gao, B., and Belongie, S. J. Generative adversarial perturbations. Co RR , abs/1712.02328, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Adversarial Example Decomposition

Abstract

1 Introduction

2 Motivation and Approach

2.1 Notation

2.2 Δxnoise\Delta x_{noise}Δxnoise​ and Δxnr\Delta x_{nr}Δxnr​ Decomposition

2.3 Δxarch\Delta x_{arch}Δxarch​ and Δxdata\Delta x_{data}Δxdata​ Decomposition

3 Results

3.1 Δxnoise\Delta x_{noise}Δxnoise​ and Δxnr\Delta x_{nr}Δxnr​ Decomposition

3.1.1 Recombining components

3.2 Δxarch\Delta x_{arch}Δxarch​ and Δxdata\Delta x_{data}Δxdata​ Decomposition

3.3 Ablation

4 Conclusion

A. Different attack settings

B. Varying number of models/iterations

C. Justification of Equations

2.2 $\Delta x_{noise}$ and $\Delta x_{nr}$ Decomposition

2.3 $\Delta x_{arch}$ and $\Delta x_{data}$ Decomposition

3.1 $\Delta x_{noise}$ and $\Delta x_{nr}$ Decomposition

3.2 $\Delta x_{arch}$ and $\Delta x_{data}$ Decomposition