Adversarial Example Decomposition
Horace He, Aaron Lou, Qingxuan Jiang, Isay Katsman, Serge Belongie,, Ser-Nam Lim

TL;DR
This paper introduces a method to decompose adversarial examples into architecture, data, and noise components, revealing their transferability properties and enabling improved adversarial transferability.
Contribution
It proposes a novel decomposition of adversarial examples into three bias sources, enhancing understanding and transferability of adversarial attacks.
Findings
Noise-dependent components transfer poorly across models.
Architecture-dependent components transfer better among same-architecture models.
Recombining components improves transferability without losing original efficacy.
Abstract
Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.
| 68.3% | 45.6% | 46.7% | |
| 63.7% | 61.9% | 59.5% | |
| 60.2% | 19.8 % | 20.3% |
| 65.8% | 63.6% | 65.1% | |
| 2:1 | 68.5% | 63.7% | 65.2% |
| 1.5:1 | 69.4% | 61.2% | 62.8% |
| 1:1 | 69.8% | 56.0% | 56.4% |
| 1:2 | 70.0% | 53.1% | 53.5% |
| 69.8% | 51.0% | 51.0% |
| source | { | ||||
|---|---|---|---|---|---|
| 60.9% | 50.7% | 59.4% | 50.7% | ||
| ResNet18 | 54.6% | 61.4% | 54.8% | 60.8% | |
| 52.4% | 26.7% | 36.9% | 30.3% | ||
| 62.8% | 46.2% | 62.9% | 47.1% | ||
| DenseNet121 | 58.4% | 58.3% | 57.2% | 55.7% | |
| 54.1% | 24.4% | 43.1% | 26.0% | ||
| 65.3% | 41.9% | 65.7% | 41.9% | ||
| GoogLeNet | 59.5% | 59.2% | 59.5% | 58.3% | |
| 57.9% | 22.8% | 44.8% | 26.2% | ||
| 53.8% | 48.4% | 53.2% | 49.0% | ||
| SENet18 | 55.7% | 64.5% | 54.8% | 63.8% | |
| 47.1% | 28.1% | 38.6% | 29.8% |
| Difference | |||
| 68.3% | 45.6% | 22.7 | |
| 63.7% | 61.9% | 1.8 | |
| 0.1 | 66.7% | 42.2% | 24.5 |
| 0.5 | 63.4% | 29.1% | 34.3 |
| 0.8 | 59.7% | 21.4% | 38.3 |
| 1.0 | 52.7% | 16.6% | 36.1 |
| 1.2 | 51.9% | 11.3% | 40.6 |
| 1.5 | 42.9% | 9.4% | 33.5 |
| 2.0 | 33.5% | 7.2% | 26.3 |
| { | ||||
|---|---|---|---|---|
| 39.0% | 16.4% | 14.4% | ||
| .01 | 25.1% | 28.4% | 22.2% | |
| 26.6% | 06.2% | 05.3% | ||
| 68.3% | 45.6% | 46.7% | ||
| .03 | 63.7% | 61.9% | 59.5% | |
| 60.2% | 19.8% | 20.3% | ||
| 81.2% | 69.7% | 73.6% | ||
| .06 | 81.1% | 80.5% | 85.8% | |
| 77.7% | 39.4% | 40.0% |
| # of iters | { | ||
|---|---|---|---|
| 65.2% | 43.4% | ||
| 5 | 58.8% | 58.8% | |
| 55.5% | 20.6% | ||
| 68.3% | 46.7% | ||
| 10 | 63.7% | 61.9% | |
| 60.2% | 19.8% | ||
| 72.9% | 48.6% | ||
| 100 | 67.3% | 65.2% | |
| 60.3% | 18.7% |
| # of models | { | |||
|---|---|---|---|---|
| 69.4% | 46.6% | 45.6% | ||
| 3 | 57.6% | 62.1% | 51.9% | |
| 60.1% | 24.9% | 29.2% | ||
| 68.4% | 47.0% | 44.8% | ||
| 5 | 60.1% | 62.0% | 55.2% | |
| 57.5% | 22.4% | 24.6% | ||
| 68.3% | 45.6% | 46.7% | ||
| 10 | 63.7% | 61.9% | 59.5% | |
| 60.2% | 19.8% | 20.3% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
Adversarial Example Decomposition
Horace He
Aaron Lou
Qingxuan Jiang
Isay Katsman
Serge Belongie
Ser-Nam Lim
Abstract
Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.
Machine Learning, ICML
1 Introduction
Due to the recent successes of neural networks on a wide variety of tasks, they are now being widely applied in the real-world. However, despite their major successes, recent works have shown that in the presence of adversarially perturbed input, they fail catastrophically (Szegedy et al., 2013; Goodfellow et al., 2014). Moreover, Szegedy et al. (2013); Goodfellow et al. (2014) showed that inputs adversarially generated for one model often cause other models to misclassify images as well, a phenomenon commonly called transferability.
Our understanding of the causes of transferability is fairly limited. Tramèr et al. (2017) analyzes local similarity of decision boundaries to define a local decision boundary metric that determines how transferable adversarial examples between two models are likely to be. However, many questions are still open. The recent work Wu et al. (2018) hypothesized that adversarial perturbations could be decomposed into initialization-specific and data-dependent components. It is also hypothesized that the data-dependent component is primarily what contributes to transfer. However, Wu et al. (2018) provides neither theoretical nor empirical evidence to justify this hypothesis.
Our work aims to examine this hypothesis in greater detail. We first augment the previous hypothesis to provide decomposition into three parts: architecture-dependent, data-dependent, and noise-dependent components. Given this framework, our contributions are as follows:
- •
We propose a method for decomposing adversarial perturbations into noise-dependent and noise-reduced components.
- •
We also present a method to further decompose the noise-reduced component into architecture-dependent and data-dependent components.
- •
Extensive experiments are conducted on CIFAR-10 (Krizhevsky, 2009) using various architectures to show the above two decompositions have the desired properties. Results from an ablation study are given to show the significance of the nontrivial choices made in our methodology.
2 Motivation and Approach
Motivated by the reviewers’ comments on Wu et al. (2018), we seek to provide further evidence that an adversarial example can be decomposed into model-dependent and data-dependent portions. First, we augment our hypothesis to claim that an adversarial perturbation can be decomposed into architecture-dependent, data-dependent, and noise-dependent components. We note that it is clear that these are the only things that could contribute in some way to the adversarial example. An intuition behind why noise-dependent components exist and would not transfer despite working on the original dataset is shown in Figure 2.
Not drawn explicitly in the figure is the architecture-dependent component. As neural networks induce biases in the decision boundary, and specific network architectures induce specific biases, we would expect that an adversarial example could exploit these biases across all models with the same architecture.
2.1 Notation
We denote to be the set of model architectures. Let to be a set of fully trained models of architecture initialized with random noise. The superscript will be omitted when architecture is clear.
We define an attack , where is an image, its corresponding label, is a neural network model as defined above, is a loss function, the initial perturbation of , and a perturbation of such that is maximal.
For fixed architecture , model , and attack , we denote to be the three components of introduced in previous sections. Let ; we will use the short hand .
Let denotes the projection of vector onto vector . Let be the unit vector with same direction as .
2.2 and Decomposition
**Description: ** We fix our architecture and have as our set of trained models. Set to be the cross-entropy loss and let be the generated adversarial perturbation for .
**Proposition: ** can be decomposed into such that the attack is effective on but transfers poorly to , while transfers well on all models.
The equations for computing and are given in Equation 1 (see Appendix C for justification). The technique is illustrated in Figure 1.
[TABLE]
2.3 and Decomposition
**Description: ** We reuse notation from the above section, except that we now consider a set of different architectures
Proposition: can be composed into such that the attack is effective on but transfers poorly to , while transfers well on all models.
The equations for computing and are given in Equation 2, in which we set to be the noise reduced perturbation generated on (see Appendix C for justification). We approximate the expectation for by averaging across architectures.
[TABLE]
3 Results
We empirically verify the approaches given in the motivation above and show that the isolated noise and architecture-dependent perturbations show the desired properties. Unless stated otherwise, all perturbations are generated on CIFAR-10 (Krizhevsky, 2009) (original images rescaled to ) using iFGSM (Kurakin et al., 2016) with 10 iterations, distance metric , and . All experiments are run on the first 2000 CIFAR-10 test images. In addition, all models are trained for only 10 epochs due to computational constraints. All percentages reported are fooling ratios (Moosavi-Dezfooli et al., 2017). For results with other settings, check Appendix A.
3.1 and Decomposition
We start off with a set of 10 retrained ResNet18 (He et al., 2016) models . We attack the first ResNet18 model () to get a perturbation for a given . We then follow the process outlined in Equation 1 to obtain from the other 9 retrained ResNet18 models (). We then test on an untouched set of 5 retrained ResNet18 models . We also do the same process for DenseNet121 (Huang et al., 2017) instead of ResNet18 and report their respective results in Tables 1 and 2.
We note that achieves a far lower transfer rate than either or while still maintaining relatively high error rate on the original model, providing evidence for the success of this decomposition. To the best of our knowledge, this is the first methodology that is able to construct adversarial examples with especially low transferability. Although this is of low practical use, this is theoretically interesting. We note that although we attempt to generate by multi-fooling across 9 retrained models, reducing noise in high dimensions is difficult, so we are unable to achieve a perfect decomposition of . Ablation studies in Appendix B suggest that we may be able to achieve a better decomposition with a larger set of retrained models.
3.1.1 Recombining components
As the components and are linearly independent unit vectors, and by definition, is in the span of these vectors, we can find unique scalars and such that . Experimentally, we find that under our setting, and . We note that for our original perturbation, this is perhaps an undue amount of focus paid to the noise-specific perturbation. We can now try setting and to different ratios, which correspond to how much we wish to emphasize attacking the original model vs. transferability. As we are now able to set an arbitrarily high and , allowing us to saturate the epsilon constraints, we sign maximize (ie: ), as motivated in Goodfellow et al. (2014)) to level the playing field. The results in Table 3 show the results of performing these experiments on ResNet18. We find that we are able to generate perturbations that perform equivalently with on , while performing substantially better when transferring to and .
3.2 and Decomposition
To evaluate decomposition into architecture and data-specific components, we consider the four architectures ResNet18 (He et al., 2016), GoogLeNet (Szegedy et al., 2015), DenseNet121 (Huang et al., 2017), and SENet18 (Hu et al., 2017). Results are given in Table 4. In each experiment we first fix a source architecture and generate by attacking 4 retrained copies of , denoted as . We then generate by attacking four copies of each for twelve models total. We then test on another 4 retrained copies of called as well as , consisting of four copies of each of the other three architectures . We see that for all four models, obtains significantly higher error rate on than on . In addition, the relative error between and for are close to the relative error between and for when averaged across models, supporting the success of our decomposition.
3.3 Ablation
Orthogonality We assume that , , and terms are orthogonal. We note that if these vectors had no relation to each other, then due to the properties of high dimensional space, they are approximately orthogonal with very high probability.
We vary orthogonality by modifying the method in Section 2.2 to generate with . When , we recover the original algorithm, and when , . Experimentally varying the orthogonality of and produces the results in Table 5; note that we achieve the greatest difference in efficacy between the original model and transferred models when they are near-orthogonal, suggesting that the assumption we made is reasonable.
However, it is not true that orthogonal components achieve the best isolation (given by the fact that the peak difference seems to be at ). This suggests that our current method of decomposition may simply be an approximation for the true components, and that a more nuanced method may be necessary for better isolation.
Number of Models We find that the higher the number of models we use to approximate , the more successfully we are able to isolate . Check Appendix B for full results.
4 Conclusion
We demonstrate that it is possible to decompose adversarial perturbations into noise-dependent and data-dependent components, a hypothesis reviewers thought was interesting but unsupported in (Wu et al., 2018). We go even further by decomposing an adversarial perturbation into model related, data related, and noise related perturbations. A major contribution here is a new method of analyzing adversarial examples; this creates many potential future directions for research. One interesting direction would be extending these decompositions to universal perturbations (Moosavi-Dezfooli et al., 2017; Poursaeed et al., 2017) and thus removing the dependence on individual data points. Another avenue to explore is analyzing various attacks and defenses and how they interplay with these various components.
A. Different attack settings
To show that our decomposition is effective across a variety of attack settings, we perform the experiment of Section 3.1 with three different iFGSM settings corresponding to . Results are shown in Table 6.
B. Varying number of models/iterations
We investigate the effectiveness of the Section 3.1 decomposition as we vary hyper-parameters. Results for increasing iFGSM iterations in Table 7 and results for increasing the results for increasing the number of models are give in Table 8.
C. Justification of Equations
Justification of Equations in 3.1
Recall that the equations are given by
[TABLE]
We assume that the expected value of our noise term is [math] over all random noise. This is motivated because the random noise at initialization is a Gaussian distribution centered at [math], and it is reasonable to assume that the model distribution and the noise distribution follows a similar pattern.
Letting over all random initialization , we claim that . Since and are noise independent, which means that
[TABLE]
where is the noise component corresponding with the noise of model . Therefore, it follows that
[TABLE]
By the law of large numbers, it follows that . Therefore, we note that, for sufficiently large , it follows that
[TABLE]
We see that, since the cross entropy loss is additive and the attack that we examine are first order differentiation methods, we have
[TABLE]
To prove the other claim, we have already shown through empirical results and an intuition that and are linearly independent that and are very close to orthogonal and compose . Therefore, it follows that we can take the use the projection of implies that
[TABLE]
up to a scaling constant.
Justification of Equations in 3.2
Recall that the equations are, given generated on ,
[TABLE]
We make two core assumptions:
- •
The value of . This is a reasonable assumption since our generated architectures should produce roughly symmetric error vectors .
- •
is equivalent in the sense that the former produces a noised reduce gradient closer to . This is reasonable because the space of there are many adversarial perturbations (different directions) and changing our start location won’t cripple our search space. Furthermore, we use this to generate a close to .
We claim that where we take over architecture . To see this, we note that
[TABLE]
and so again we can approximate it with where is the component generated for model . For sufficiently large , it follows that
[TABLE]
Therefore we have
[TABLE]
and by our assumption this is roughly equivalent to
[TABLE]
as desired. To prove the other claim, we use an analogous argument to the one above as we have shown that and are orthogonal and applying the same projection technique yields
[TABLE]
up to a scaling constant.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Goodfellow et al. (2014) Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. Co RR , abs/1412.6572, 2014.
- 2He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 770–778, 2016.
- 3Hu et al. (2017) Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. Co RR , abs/1709.01507, 2017.
- 4Huang et al. (2017) Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 2261–2269, 2017.
- 5Krizhevsky (2009) Krizhevsky, A. Learning multiple layers of features from tiny images. 2009.
- 6Kurakin et al. (2016) Kurakin, A., Goodfellow, I. J., and Bengio, S. Adversarial examples in the physical world. Co RR , abs/1607.02533, 2016.
- 7Moosavi-Dezfooli et al. (2017) Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., and Frossard, P. Universal adversarial perturbations. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 86–94, 2017.
- 8Poursaeed et al. (2017) Poursaeed, O., Katsman, I., Gao, B., and Belongie, S. J. Generative adversarial perturbations. Co RR , abs/1712.02328, 2017.
