Scalable Attribution of Adversarial Attacks via Multi-Task Learning
Zhongyi Guo, Keji Han, Yao Ge, Wei Ji, Yun Li

TL;DR
This paper introduces a multi-task learning framework called MTAA for scalable attribution of adversarial attacks, recognizing attack algorithm, victim model, and hyperparameters simultaneously to improve defense insights.
Contribution
The paper proposes a novel multi-task learning approach for adversarial attribution that considers relationships between attack signatures, addressing limitations of single-label classification methods.
Findings
MTAA effectively recognizes attack signatures on MNIST and ImageNet.
The framework handles false alarms and improves attribution accuracy.
Scalability demonstrated with multiple attack signatures.
Abstract
Deep neural networks (DNNs) can be easily fooled by adversarial attacks during inference phase when attackers add imperceptible perturbations to original examples, i.e., adversarial examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which can help defenders to seize the clues about the originator of the attack, their goals, and provide insight into the most effective defense algorithm against corresponding attacks. With such a gap, it is necessary to develop techniques that can recognize tool-chains that are leveraged to generate the adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., {\em attack algorithm}, {\em victim model} and {\em hyperparameter}.…
| Attack Algorithm | Hyperparameter | Victim Model | ||
|---|---|---|---|---|
| FGSM() | : 10/255-200/255(10/255) | InceptionV3 ResNet18 ResNet50 VGG16 VGG19 | ||
| PGD() |
|
|||
| C&W() |
|
| InceptionV3 | ResNet18 | ResNet50 | VGG16 | VGG19 | Average | |
|---|---|---|---|---|---|---|
| FGSM | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| PGD | 100.00 | 97.00 | 100.00 | 99.75 | 99.25 | 99.20 |
| CW | 99.75 | 99.50 | 100.00 | 100.00 | 99.75 | 99.80 |
| Average | 99.92 | 98.83 | 100.00 | 99.92 | 99.67 | 99.67 |
| InceptionV3 | ResNet18 | ResNet50 | VGG16 | VGG19 | Average | |
|---|---|---|---|---|---|---|
| FGSM | 99.50 | 99.25 | 99.00 | 99.50 | 92.50 | 97.95 |
| PGD | 99.00 | 98.00 | 98.50 | 95.25 | 98.00 | 97.75 |
| CW | 99.75 | 98.50 | 98.00 | 98.75 | 95.00 | 98.00 |
| Average | 99.42 | 98.58 | 98.50 | 97.83 | 95.17 | 97.90 |
| InceptionV3 | ResNet18 | ResNet50 | VGG16 | VGG19 | Average | |
|---|---|---|---|---|---|---|
| FGSM | 97.25 | 99.50 | 99.75 | 99.75 | 98.00 | 98.85 |
| PGD | 70.00 | 86.50 | 87.00 | 78.50 | 82.50 | 80.90 |
| CW | 13.00 | 14.75 | 16.00 | 12.00 | 7.75 | 12.70 |
| Average | 60.08 | 66.92 | 67.58 | 63.42 | 62.75 | 64.15 |
| InceptionV3 | ResNet18 | ResNet50 | VGG16 | VGG19 | Average | |
|---|---|---|---|---|---|---|
| FGSM | 96.25 | 89.75 | 97.00 | 92.50 | 95.25 | 94.15 |
| PGD | 71.25 | 57.50 | 60.75 | 66.25 | 61.50 | 63.45 |
| CW | 26.00 | 25.00 | 28.75 | 23.50 | 27.00 | 26.05 |
| Average | 64.50 | 57.42 | 62.17 | 60.75 | 61.25 | 61.22 |
| Structure Name | Output Size | Architecture |
| Encoder | 8*8 | [2*2, 512], stride 2, padding 1 |
| 4*4 | 2*2 maxpooling, stride 2 | |
| 3*3 | [2*2, 256], stride 2, padding 1 | |
| 2*2 | 2*2 maxpooling, stride 1 | |
| Decoder | 5*5 | [3*3, 512], stride 2 |
| 11*11 | [5*5, 256], stride 2, padding 1 | |
| 14*14 | [6*6, 256], stride 1, padding 1 |
| Backbone | Model | FLOPS(G) | Params(M) | Attack Algorithm(%) | Victim Model(%) | Hyperparameter(RMSE) | ||
|---|---|---|---|---|---|---|---|---|
| Single Task | Self-built | [21] | - | - | 88.54 | 83.22 | 12.97 | +0.00 |
| ResNet-50 | [22] | 12 | 71 | 97.43 | 93.25 | 7.93 | +0.00 | |
| ResNet-101 | 24 | 128 | 98.68 | 94.72 | 7.33 | +0.00 | ||
| MTL | ResNet-50 | MTAA | 9 | 48 | 99.68 | 96.95 | 7.32 | +4.66 |
| ResNet-101 | MTAA | 21 | 108 | 99.78 | 97.84 | 6.79 | +3.93 |
| Attack Algorithms | Victim Models | [21] | [22] | Pre-experiment | MTAA | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Attack Algorithm+Victim Model+Hyperparameter | Attack Algorithm | Victim Model | Hyperparameter | ||||||||||||
|
|
57.53 | 64.15 | 64.15 | 100 | 99.88 | 6.04 | ||||||||
|
|
83.74 | 92.21 | 92.21 | 100 | 100 | 3.72 | ||||||||
|
|
55.76 | 68.33 | 68.33 | 100 | 99.96 | 5.71 | ||||||||
|
|
85.54 | 91.71 | 91.71 | 100 | 100 | 2.06 | ||||||||
|
|
50.06 | 56.62 | 56.62 | 100 | 99.96 | 7.02 | ||||||||
| Attack Algorithms | Victim Models | [21] | [22] | Pre-experiment | MTAA | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Attack Algorithm+Victim Model+Hyperparameter | Attack Algorithm | Victim Model | Hyperparameter | ||||||||||||
|
|
50.71 | 59.73 | 61.22 | 99.78 | 97.84 | 6.79 | ||||||||
|
|
71.26 | 78.53 | 82.96 | 99.97 | 98.93 | 5.96 | ||||||||
|
|
49.98 | 59.42 | 62.33 | 99.97 | 98.24 | 7.76 | ||||||||
|
|
63.76 | 71.32 | 84.07 | 99.93 | 98.69 | 6.02 | ||||||||
|
|
39.98 | 47.21 | 48.04 | 99.97 | 98.21 | 8.01 | ||||||||
| Backbone | Model | FLOPS(G) | Params(M) | Attack Algorithm(%) | Victim Model(%) | Hyperparameter(RMSE) | ||
|---|---|---|---|---|---|---|---|---|
| Single Task | ResNet-50 | 12 | 71 | 100 | 99.89 | 7.01 | +0.00 | |
| MTL | ResNet-50 | MTAA | 9 | 48 | 100 | 99.98 | 6.51 | +2.41 |
| Backbone | Model | FLOPS(G) | Params(M) | Attack Algorithm(%) | Victim Model(%) | Hyperparameter(rmse) | ||
|---|---|---|---|---|---|---|---|---|
| Single Task | ResNet-101 | 24 | 128 | 97.58 | 94.45 | 11.23 | +0.00 | |
| MTL | ResNet-101 | MTAA | 21 | 108 | 99.81 | 96.41 | 7.21 | +13.39 |
| Architecture of MTAA | Attack Algorithm | Victim Model | Hyperparameter |
|---|---|---|---|
| ResNet18+simple add loss | 98.92 | 84.72 | 7.88 |
| ResNet18+Uncertainty loss weight | 99.54 | 97.84 | 7.21 |
| ResNet18+Uncertainty loss weight+TSL | 99.79 | 98.37 | 7.02 |
| ResNet50+Uncertainty loss weight+TSL | 99.94 | 99.26 | 6.42 |
| ResNet50+Uncertainty loss weight+TSL+PE | 100 | 99.88 | 6.04 |
| Architecture of MTAA | Attack Algorithm | Victim Model | Hyperparameter |
|---|---|---|---|
| ResNet50+simple add loss | 98.12 | 70.82 | 8.7 |
| ResNet50+Uncertainty loss weight | 99.3 | 95.98 | 7.9 |
| ResNet50+Uncertainty loss weight+TSL | 99.48 | 96.1 | 7.81 |
| ResNet101+Uncertainty loss weight+TSL | 99.61 | 97.13 | 7.25 |
| ResNet101+Uncertainty loss weight+TSL+PE | 99.78 | 97.84 | 6.79 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
Scalable Attribution of Adversarial Attacks via Multi-Task Learning
Zhongyi Guo
Nanjing University of Posts and Telecommunications
Nanjing
&Keji Han
Nanjing University of Posts and Telecommunications
Nanjing
&Yao Ge
Nanjing University of Posts and Telecommunications
Nanjing
&Wei Ji
Nanjing University of Posts and Telecommunications
Nanjing
&Yun Li
Jiangsu Key Laboratory of Big Data Security and Intelligent Processing
Nanjing University of Posts and Telecommunications
Nanjing
Abstract
Deep neural networks (DNNs) can be easily fooled by adversarial attacks during inference phase when attackers add imperceptible perturbations to original examples, i.e., adversarial examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which can help defenders to seize the clues about the originator of the attack, their goals, and provide insight into the most effective defense algorithm against corresponding attacks. With such a gap, it is necessary to develop techniques that can recognize tool-chains that are leveraged to generate the adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., attack algorithm, victim model and hyperparameter. Current works transfer AAP into single label classification task and ignore the relationship between these signatures. The former will meet combination explosion problem as the number of signatures is increasing. The latter dictates that we cannot treat AAP simply as a single task problem. We first conduct some experiments to validate the attributability of adversarial examples. Furthermore, we propose a multi-task learning framework named Multi-Task Adversarial Attribution (MTAA) to recognize the three signatures simultaneously. MTAA contains perturbation extraction module, adversarial-only extraction module and classification and regression module. It takes the relationship between attack algorithm and corresponding hyperparameter into account and uses the uncertainty weighted loss to adjust the weights of three recognition tasks. The experimental results on MNIST and ImageNet show the feasibility and scalability of the proposed framework as well as its effectiveness in dealing with false alarms.
K****eywords Deep neural network Adversarial attack attribution Multi-task learning
1 Introduction
In the past two decades, deep neural networks (DNNs) have shown outstanding performance across various tasks in computer vision, such as image classification [1, 2], semantic segmentation [3, 4] and object detection [5, 6]. Nevertheless, DNNs are demonstrated to be easily fooled by adversarial attack [7, 8], such as evasion attack, which is accomplished during inference phase by adding imperceptible perturbations to examples. Some classic adversarial attack algorithms include Fast Gradient Sign Method (FGSM) [9], Projected Gradient Descent (PGD) [10] and Carlini & Wagner (C&W) [11].
Extensive efforts focus on the detection [12, 13] and adversarial training [10, 14] of adversarial defenses. while few works study the Adversarial Attribution Problem(AAP), which is an important part of Reverse Engineering of Deceptions (RED) [15]. According to the assertion of Defense Advanced Research Projects Agency (DARPA), RED is aimed at “developing techniques that automatically reverse engineer the tool-chains behind attacks such as multimedia falsification, adversarial machine learning attacks, or other information deception attacks.” With the same purpose of RED and as the extension of adversarial detection, as shown in Figure 1, AAP concentrates on better understanding the hidden signatures in adversarial examples, i.e., attack algorithm, victim model and hyperparameter.
With the rapid development of deep learning, an increasing number of publicly available adversarial tools, frameworks and models can be downloaded and modified for adversarial need [16, 17]. Besides, it remains unknown whether fully defending neural networks against adversarial attacks is computationally costly or theoretically impossible [18, 19]. Adversarial attribution can further help defenders to seize the clues about the originator of the attack, their goals, and provide insight into the most effective defense algorithm against corresponding attacks [15]. The following works make a preliminary exploration of AAP. Ref. [20] investigates the attribution of attack types using fewer training samples by self-supervised learning. Ref. [21] primarily explores the attributability of attack algorithm, victim model, hyperparameter and norm using a self-built 11-layer neural network. However, they consider these four signatures separately and ignore the relationship between them. What is more, they conduct experiments on small-scale datasets like MNIST and CIFAR-10, thus lacking diversity in dataset. Ref. [22] explores the attribution of attack algorithm and victim model using the structure of ResNet50’s feature extractor plus a Multilayer Perceptron(MLP) classifier. Ref. [23] explores the attribution of attack algorithm and hyperparameter using ResNet50 as backbone. Unfortunately, Ref. [20], Ref. [22] and Ref. [23] only study one or two signatures recognition and Ref. [21] consider hyperparameter values at large intervals, e.g, 0.03, 0.1, 0.2 for maximum perturbation \varepsilon\in FGSM and 0.01, 0.1, 1.0 for confidence \kappa\ in C&W. Thus lacking the integrity of signature recognition. What is more, all of these works transfer AAP into single-task classification problem, i.e., combine these signatures together to form one label, such as FGSM +ResNet18+10/255. Last but not least, the relationship among these signatures is neglected in these works. Overall, it is urgent to propose a unified and extensible framework for adversarial attribution to deal with more signatures and alleviate the combination explosion issue.
In this paper, in order to figure out AAP, we first provide some pre-experiment results to comprehensively discuss the attributability of adversarial attack on large-scale dataset like ImageNet, and adopt a typical DNN to classify the combination of three signatures, i.e., attack algorithm, victim model and hyperparameter. Then to alleviate the combination explosion problem of three signatures and leverage the relationship between them, we propose a multi-task learning framework to solve APP with scalability in terms of model architecture and attribution scenario. For fairness of experiment, we compare MTAA with single-task model, i.e., train individual DNN for each three signature. Experimental results show that AAP should be considerd as a multi-task learning problem rather than a single-label classification problem nor a single-task learning problem. Finally, we further consider the false alarms caused by clean images, i.e., when clean examples cannot be easily distinguished from adversarial examples by adversarial detector.
We summarize the main contributions as follows:
- •
The attributability of adversarial attacks, especially on large-scale dataset, is discussed and varified.
- •
A Multi-Task Adversarial Attribution (MTAA) model is proposed to explore AAP and recognize three signatures simultaneously.
- •
Experiment is conducted to illustrate the high performance of our MTAA in scalability and generalization.
2 Related work
Adversarial Attack was first discovered by Szegedy [7], who reveals the vulnerability of deep learning model that attackers can manipulate its predictions by adding visually imperceptible perturbations to images. Recently, large amounts of adversarial algorithms spring out. The most representative attacks among them are gradient-based attacks like one-shot FGSM [9] and iterative PGD [10], as well as optimization-based attack like C&W [11].
FGSM [9] crafts adversarial example with the sign of gradient in regard to ground truth label and can be formulated as:
[TABLE]
where is clean example, is its label. {h}\left(\cdot\right)\ is the victim model whose parameter is . is loss function. is gradient of . \operatorname{sign}\left(\cdot\right)\ is the gradient sign function. is the hyperparameter that controls the attack intensity.
PGD [10] can be seen as the iterative version of FGSM and is formulated as:
[TABLE]
where is adversarial example in step , is its label. is loss function. is gradient of . \operatorname{sign}\left(\cdot\right)\ is the gradient sign function. \operatorname{Clip}_{x,\varepsilon}\left(\cdot\right)\ performs clipping at attack intensity . is the step-size in each attack iteration.
C&W [11] computes the adversarial perturbation by solving the following optimisation problem:
[TABLE]
where is the perturbation to be optimized. is a suitably chosen constant. {t}\left(\cdot\right)\is an objective function satisfying = if and only if , in which {h}\left(\cdot\right)\is the victim model and is the target label.
Multi-Task Learning (MTL) is to utilize valuable content included in multiple related tasks to polish up the generalization performance on overall tasks [24]. Given learning tasks where all or part of them are related, multi-task learning seeks a balanced strategy through learning the tasks together to improve the performance of a model on overall tasks {T_{i}}\by leveraging the knowledge included in all or part of other tasks. A task {T_{i}}\is usually accompanied by a training dataset {D_{i}}\ consisting of {n_{i}}\training samples, i.e., , where is the th training sample in {T_{i}}\and is its label. If different tasks are located in the same feature space, which means {d_{a}}\equals {d_{b}}\for any , this MTL belongs to homogeneous-feature MTL, or else it belongs to heterogeneous-feature MTL. The major framework of multi-task learning DNNs can be classified to hard parameter sharing that shares the hidden layers between all tasks and soft parameter sharing that designs model for each task. As far as we know, few works use multi-task learning to explore AAP.
3 Methodology
3.1 Overview
In this section, we first introduce the pre-experiment to validate the attributability of adversarial examples, and then propose the multi-task learning framework for AAP. The attribution scenario we consider is shown in Table 1. \varepsilon\in FGSM and PGD is maximum perturbation. For PGD, \alpha\is step size and step is attack iteration number. For C&W, \kappa\is confidence of attack, is parameter for box-constraint and step is attack iteration number. For hyperparameter, we consider maximum perturbation \varepsilon\for FGSM and PGD ranging from 10/255 to 200/255 with step size 10/255 while confidence \kappa\for C&W ranging from 5 to 100 with step size 5.
3.2 Pre-experiment
We conduct a pre-experiment to discuss the feasibility of adversarial attribution. The pre-experiment consists of three steps: (1) AAP analysis: AAP tends to recognize three signatures behind adversarial examples. Just as [21] and [22], we take it as single-label classification problem in pre-experiment and discuss the following two types of single-label classification tasks: one is the combination of attack algorithm and victim model. For example, combine 3 attack algorithms and 5 victim models in Table 1, we can gain the classification tasks with 35=15 Attack Algorithm+Victim Model classes, such as FGSM+Inceptionv3, FGSM+ResNet18, etc; Another is the combination of attack algorithm, victim model and hyperparameter. For example, combine 3 attack algorithms, 5 victim models and 20 hyperparameter values to form a classification task with 3520=300 Attack Algorithm+Victim Model +Hyperparameter classes, such as FGSM + Inceptionv3 +10/255, FGSM +ResNet18+20/255, etc. Note that all victim models are pretrained models on corresponding dataset. (2) Generate adversarial examples for these classes: We generate different types of adversarial examples with setting in Table 1. (3) Train classifier: We train a classifier to accomplish the above two classification tasks for adversarial attribution.
We leverage shuffled MNIST training dataset that contains 55000 images with size 28 × 28 and ImageNet validating dataset which contains 50000 images with size 224 × 224, along with attack algorithms tool box Cleverhans [16] to generate adversarial examples. For each Attack Algorithm+Victim Model class, we generate 3200 examples for training and 400 examples for testing. For each Attack Algorithm+Victim Model+Hyperparameter class, we generate 160 examples for training and 20 examples for testing. We choose pre-trained ResNet50 and ResNet101 as classifier for MNIST and ImageNet, respectively. Adam [25] is employed with cosine annealing LR schedule whose initial learning rate \beta=0.001\, weight decay \pi=0.001\ and mini-batch size= 64. The attribution performance on MNIST are shown in Tables 2 and 4, respectively. And the attribution performance on ImageNet are shown in Tables 3 and 5, respectively. We also visualize the t-SNE plot of attack-model 15 classification task on ImageNet using the logits before Softmax layer of ResNet101 in Figure 2.
As shown in Table 2 and 3, we can clearly observe the success of attack algorithm and victim model attribution on MNIST and ImageNet with a high average accuracy of 99.67 and 95.78, respectively. Moreover, two conclusions can be drawn from Figure 2: (1) the manifold of different attack algorithms can be clearly separated, which means it is easy to recognize attack algorithms. The perturbation patterns of different attack algorithms on ImageNet are shown in Figure 3,which indicates that C&W’s perturbation is covert, PGD and FGSMs’ perturbations are relatively obvious. (2) there is a slight overlap between the manifold of VGG16 and VGG19 for all three attack algorithms, which is consistent with the results in Table 2 that VGG16 and VGG19 have relatively low attribution accuracy. The reason lies in the similarity between the structures of these two victim models, which is consistent with the conclusion of [22]. Figure 4 shows the perturbation patterns of different victim models on ImageNet. These patterns cannot be distinguished visually but can be easily distinguished by DNNs.
The elements in Table 4 and 5 are the average of 20 hyperparameter values of each Attack Algorithm+Victim Model on MNIST and ImageNet, respectively. The average classification accuracy in Table 4 and 5 shows the feasibility of Attack Algorithm+Victim Model+ Hyperparameter attribution on these two dataset, which is higher than random guess. Moreover, for FGSM, it is easy to classify its \varepsilon\ with high accuracy both in first row of Table 4 and 5 because it is a one-step attack that constraints the maximum perturbation at each pixel. While for C&W(), the recognition accuracy of attack’s hyperparameter is not high as both in third row of Table 4 and 5. As shown in Figure 3, C&W() is an optimization-based attack, who takes minimizing perturbation as the objective function and thus has more subtle perturbations than FGSM and PGD.
3.3 Multi-Task Adversarial Attribution (MTAA)
Our pre-experiment shows that the above three signatures are attributable. However, both prior works and our pre-experiment treat attribution of three signatures as a single-label classification problem. Besides, they ignore the relationship between these signatures and the numerical value of hyperparameter.
Before introducing our work, we want to explain the following two questions first:
1. Why should AAP not be treated as a single-label classification problem?
There are innumerable kinds of attack algorithms and victim models in a real-world setting. Following the attribution scenario in Table 1, if we only consider 2 attack algorithms and 3 victim models with 20 different hyperparameter values, there will be 2320=120 classes. Then, if we consider 3 attack algorithms, 5 victim models and 20 hyperparameter values, there will be 3520=300 classes. What if we consider 5 attack algorithms, 8 victim models and 20 hyperparameter values? Unfortunately, there will be 5820=800 classes. We call it combination explosion problem in single-label classification. The classification performance of attribution framework will be highly unstable with the increment of classes, which will be discussed in Section 4.3. Besides, the value of hyperparameter is actually continuous, hence hyperparameter recognition should be regarded as regression task.
2. Why should AAP be treated as a multi-task learning problem rather than a single-task learning problem?
As shown in Table 1, each class of attack algorithms has its individual hyperparameter, which means hyperparameter recognition relies on the result of attack algorithm classification. Naturally, multi-task learning can deal with this owner-member relationship because it aims to design networks capable of learning shared representations from multi-task supervisory signals. Most importantly, they have the potential for improved performance if the associated tasks share complementary information. By contrast, single-task learning solves each individual task separately with individual network and ignore the relationship between the above adversarial attack signatures. The experimental results in Section 4.2 demonstrates the advantages of multi-task learning in AAP.
In order to relieve the combination explosion problem and utilize the relationship among attack algorithm, victim model and hyperparameter attribution tasks, we propose a multi-task learning framework to solve AAP. According to the previous discussion of MTL [26], AAP should be viewed as heterogeneous-feature MTL because it consists of different types of supervised tasks including classification and regression ones.
3.3.1 Overall Architecture
As shown in Figure 5, the architecture of multi-task learning framework for AAP can be divided into three parts: (1) perturbation extractor that leverages information from both clean and adversarial examples. (2) adversarial-only extractor that utilizes only adversarial examples. (3) classification& regression module: two classification layers and one regression layer are arranged to implement multi-task learning for attack algorithm classification, victim model classification and hyperparameter regression, whose input is the concatenation of perturbation extraction module and adversarial-only extraction module.
3.3.2 Perturbation Extraction Module
The perturbation extraction module leverages information from both clean and adversarial examples. We use Auto-Encoder(AE) as perturbation extractor and the architecture is shown in Table 6. AE learns effective representations of a set of data in an unsupervised manner. With an encoder and a decoder , AE is forced to minimize the reconstruction error for each input sample . However, learning the background information is unhelpful for adversarial attribution, it has also been proved by [27] that building a flow density estimator on latent representation (feature maps) works better than on the raw image. On the other hand, we find that the difference between the feature maps of the original images and adversarial examples becomes larger with deeper layers, as shown in Figure 6. Thus we add a ResNet101 feature extractor (or other DNNs) before AE to obtain latent representation (feature maps, fm for short) of adversarial and corresponding clean examples, which will help AE better learn the pixel difference between them. So we optimize AE by minimizing the loss function:
[TABLE]
the objective of function (4) is to let feature-level perturbation , where the feature maps of adversarial examples is and corresponding clean examples is . We train AE to learn manifolds of adversarial perturbation as the augmented feature.
3.3.3 Adversarial-Only Extraction Module
The adversarial-only extraction module only takes adversarial examples as input. Adversarial examples first pass global shared layers that leverage ResNet101’s feature extractor (or other DNNs) to learn the shared representation of three signatures. Then task specific layers learn task-specific representations for attack and hyperparameter as well as victim model separately. Note that the structure of task specific layers are the final feature extraction layers of ResNet101. We also use a high-low feature fusion in learning representation of attack and hyperparameter signatures, because as discussed in pre-experiment, hyperparameter is harder to recognize than other two signatures. The high-low feature fusion is to concatenate high and low features maps from layer near the output and input of ResNet101, respectively. Generally, the receptive field of low level features is small thus learn partial/detailed information, which helps the recognition of attacks because constraint perturbation on each pixel. While the receptive field of high level feature is large thus learn integral/rich information, which helps the recognition of attacks because constraint perturbation on all pixels. Finally, the feature vectors learnt from task specific layers are sent to classification®ression module.
3.3.4 Classification&Regression Module
We define attack algorithm and victim model attribution as two classification tasks. With regard to two classification layers, we use a fully connection layer. We optimize these two classification tasks by minimizing two cross-entropy losses:
[TABLE]
[TABLE]
where {{\cal L}_{C{E_{1}}}}\and {{\cal L}_{C{E_{2}}}}\are the loss function of attack algorithm and victim model classification, respectively. The total number of adversarial examples is and {Q_{j}^{t}}\is an indicator function that judges whether x_{j}^{{}^{\prime}}\’s label the same as . and are the number of attack algorithm and victim model labels, respectively. {P_{j}^{t}}\ estimates the probability that x_{j}^{{}^{\prime}}\bel-ongs to label with a softmax function.
We define hyperparameter attribution as a mixed liner regression task. For the dependency of attack algorithm and hyperparameter, we concatenate the result of attack classifier with extracted feature as the input of hyperparameter regression. Formally, the regression problem is expressed as:
[TABLE]
where is predicted value of hyperparameter, is the features extracted by task specific layers for attack and hyperparameter as well as perturbation extraction module, \tau\is the weight of , is the logits of attack classification layer, \gamma\is the weight of and \delta\is stochastic noise with mean 0 and variance \sigma\to explain the measurement error of the data itself.
We optimize this regression task by minimizing the mean square error (MSE) loss:
[TABLE]
where {\hat{Y}_{j}}\ is the estimated value of x_{j}^{{}^{\prime}}\’s hyperparameter and {\overline{Y}_{j}}\ is ground truth. is the total number of hyperparameter values.
3.4 Uncertainty Weighted Losses
Inspired by [28], the performance of multi-task learning model is strongly dependent on weight between different tasks. As a result, we choose the standard uncertainty weighted losses, which leverage homoscedastic uncertainty that is not dependent on input data but dependent on task uncertainty. Following the steps of conducting maximum likelihood inference, we first describe the probabilistic model of regression tasks and classification tasks as (9) and (10), respectively:
[TABLE]
[TABLE]
where {f^{W}}\left(x\right)\ is the output of a multi-task learning model with parameters and input . {\emph{N}}\left(\cdot\right)\ is the Gaussian likelihood accompanied by an observation noise parameter \sigma\. The classification likelihood is scaled by \sigma_{2}^{2}\ to meet a Boltzmann distribution with the logits of model through a Softmax function.
The second and third steps are factorizing over the outputs, which is illustrated in Eq.(11) and taking log of likelihood function to conduct maximum likelihood inference, respectively. The inference of regression tasks and classification tasks are given in Eq.(12) and (13), respectively:
[TABLE]
where {s_{1}},\ldots,{s_{k}}\ are are model’s outputs with k tasks, such as attack algorithm and victim model classification as well as hyperparameter regression in AAP.
[TABLE]
[TABLE]
where {\sigma_{1}}\is the model’s observation noise parameter that measures number of noise in the outputs. f_{c}^{W}\!\left(x\right)\ is the ’th component of the vector f_{c}^{W}\!\left(x\right)\.
In our case, by adding uncertainty weighted losses to overall loss function and following maximum likelihood inference, we can formally describe the combined loss function as follows:
[TABLE]
where {{\cal L}_{1}}\left(W\right)\!=\!{\left\|{{y_{1}}-{f^{W}}\left(x\right)}\right\|^{2}}\represents the MSE loss of hyperparameter regression task, {{\cal L}_{2}}\left(W\right)\!=\!-\!log\!\left({Softmax\left({{y_{2}}\!,\!{f^{W}}\!\left(x\right)}\right)}\right)\ represents the cross entropy loss of attack algorithm classification task and victim model classification’s cross entropy loss is {{\cal L}_{3}}\left(W\right)\!=\!-log\left({Softmax\left({{y_{3}},{f^{W}}\left(x\right)}\right)}\right)\. The third step in the equation transformation uses the approximation in (12) and (13). We can minimize the combined loss function by optimizing the parameters , {\sigma_{1}}\, {\sigma_{2}}\ and {\sigma_{3}}\. In order to simplify the optimization objective and improve the experimental results, the explicit simplifying assumption is used in the last approximate transition. , , measures the uncertainty of each task, which means higher scale value causes lower contribution of loss function. The three scales are regulated by the last three terms in the formula, which penalizes the objective when values are too high.
4 Experiments
In this section, we evaluate the performance of the proposed method. We ran all experiments on a computer with 2 Intel Xeon Platinum 8255C 2.50GHz *32 CPUs and 43GB memory, 2 NVIDIA RTX3090 GPUs. Our model was implemented with PyTorch.
4.1 Experimental Setup
Datasets The datasets and split of training and testing sets have been introduced in section 3.2.
Parameter setting As to our multi-task learning architecture, for perturbation extraction module we utilize pretrained ResNet50 and ResNet101 to extract feature maps of MNIST and ImageNet, respectively, and Auto-Encoder to extract perturbation; for adversarial-only extraction module we utilize pretrained ResNet50/ ResNet101 as global shared layers and two ResNet50’s/ResNet101’s last bottlenecks as task specific layers for MNIST and ImageNet, respectively. Adam [26] is employed with cosine annealing LR schedule whose initial learning rate \beta=0.001\, weight decay \pi=0.001\ and mini-batch size= 64. Besides, in order to unify the measurement scale of FGSM and PGDs’ \varepsilon\ and C&W’s \kappa\ in hyperparameter regression task, we magnify the attack intensity labels for FGSM and PGD 255 times. The concrete attribution scenario is shown in Table 1.
Evaluation metrics The accuracy is used to evaluate attack algorithm classification task and victim model classification task. The higher accuracy, the better classification performance. The Root Mean Square Error (RMSE) is used to evaluate hyperparameter regression task. The lower RMSE, the better regression performance. We measure the multi-task learning performance as in [29], i.e., the multi-task performance of model f is the average per-task drop in performance w.r.t. the single-task baseline B:
[TABLE]
where if a lower value means better performance for metric of task k, and 0 otherwise. The single-task performance is measured for a fully-converged model that uses the same backbone network only to perform that task.
In addition to a performance evaluation, we also consider the model resources, i.e., number of parameters and FLOPS, when comparing the multi-task architectures.
4.2 Attribution Experimental Results
In this section, we compare the performance of our MTAA with corresponding single task baseline, i.e., train individual DNN for each three task, that rely on the same backbone. Note that [21] and [22] treat AAP as single lable classification task, i.e., combine attack algorithm, victim model and hyperparameter together to form one label. However, in order to compare our MTAA with these two works, we reproduce the backbone of these two works and treat them as single task baseline. First, we compare the results of MTAA on MNIST and ImageNet with [21] and [22] as single task baseline in Table 7 and 8, respectively. For ImageNet, we also show the results of single task with backbone ResNet101 to ensure the fairness of experimental comparison. For FGSM and PGD attack, the RMSE is actually 6.04/255 on MINST and 6.79/255 on ImageNet because we magnify \varepsilon\’s labels 255 times to unify measurement scale.
Our MTAA offers several advantages relative to single-task learning, that is, smaller memory footprint, reduced number of calculations, and improved performance for all three signatures. The multi-task learning performance achieves 1.17% and 3.93% on MNIST and ImageNet, respectively. For two classification tasks, our framework does well on test dataset that achieve 100 and 99.88 accuracy on MNIST, as well as 99.78 and 97.84 accuracy on ImageNet, respectively. Nevertheless, MNIST is a small-scale dateset with single channel, thus the performance of backbone will be saturated and the advantage of MTAA are less obvious than that on ImageNet. For regression task, the RMSE decreases to 6.04 and 6.79 on MNIST and ImageNet, respectively, which means our framework is capable of distinguishing between different hyperparameter values on both datasets. The success of MTAA lies in: (1) MTAA takes the relationship between attack and hyperparameter attribution into account. (2) uncertainty weighted loss realizes a good balance between the importance of three signature attribution tasks, which is consistent with our ablation study in Section 4.5. (3) perturbation extraction module uses information from both clean and adversarial example, thus playing a supporting role to attribution task.
From Figure 7 we can observe that the increment of training and testing accuracy of attack algorithm and victim model classification as well as the reduction of RMSE of hyperparameter regression on ImageNet. It also indicates that the convergence rate of attack algorithm attribution is faster than that of victim model and hyperparameter attribution. We explain this phenomenon as the different uncertainty between these three tasks, that the uncertainty of attack classification is lower than other two tasks.
4.3 Scalability of MTAA
We highlight the scalability of MTAA from two aspects: (1) Model Architecture: considering the accuracy of PGD and C&W dramatically drop when we take hyperparameter into account in AAP in Table 4 and 5, it is necessary to concentrate on hyperparameter regression. Benefit from our scalable MTAA, we can add a stronger feature extractor aiming at hyperparameter represention before regression layer to relief indisposed RMSE of hyperparameter regression. (2) Attribution Scenario: [21] and [22] consider AAP as single label classification task, thus with the increment of attack algorithms and victim models, the combination explosion problem appears. To discuss this problem, we conduct experiments on 2 attack algorithms,3 victim models and 3 attacks, 5 victims with hyperparameter setting in Table 1 respectively. The results are shown in Table 9 and 10. The accuracy of single-label classifier ( [21, 22] and our pre-experiment) dramatically drop when we increase the number of attack algorithms and victim models. Besides, it is interesting that single-label classifier’s accuracy reduce to minimum when facing PGD and C&W. We think it is because PGD and C&Ws’ hyperparameter is harder to be recognized than that of FGSM, which is consistent with our conclusion in pre-experiment. However, our MTAA performs stably on two classification tasks when we increase the number of attack algorithms and victim models, and the performance on regression task fluctuates in a small range. Therefore we should consider AAP as a multi-task learning problem rather than a single label classification problem.
4.4 False Alarms
In the previous setting, the dataset only contains adversarial examples. We further consider the false alarms caused by clean images, i.e., when adversarial detector cannnot greatly distinguish between clean examples and adversarial examples. In this case, attack algorithm consists of four labels including FGSM, PGD, C&W and clean; victim model contains six labels including InceptionV3, ResNet18, ResNet50, VGG16, VGG19 and clean; hyperparameter of clean images are set to 0. As shown in Table 11 and 12, our MTAA outperforms single task baseline greatly when facing misclassified clean images. Besides, MTAA performs more stable than single task in false alarms. The hyperparameter regression result of single task drops from 8.05 to 11.23 on ImageNet, which means single task cannot handle false alarms well in terms of hyperparameter regression on ImageNet.
4.5 Ablation Study
In order to validate the effectness of different components in adversarial-only extractor module in MT-AA, such as global shared layers (GSL), weight of loss and task specific layers (TSL), some ablation experiments are conducted and the results are shown in Table 13 and 14. From the second row in Table 13 and 14, we can obtain that a suitable weight balancing method greatly influences the performance of partial and overall attribution tasks, especially in victim model classification. The experimental results in third row of both tables show that it is better to deploy LFE for different tasks that can further extracts task-specific features. According to experimental results in the forth row and third row of both tables, we can conclude that a deeper network (ResNet18 to ResNet50 for MNIST and ResNet50 to ResNet101 for ImageNet) for GSL performs better because it has stronger feature extraction capability. We also provide the ablation experimental results for perturbution extractor (PE) module in MTAA in last row of Table 13 and 14, which indicates the addition of PE will leverages the feature of both clean and adversarial examples and improve the attribution performance for AAP.
5 Conclusion
Adversarial Attribution Problem (AAP) is a vital part in Reverse Engineering of Deceptions to recognize the signatures hehind the adversarial examples, such as attack algorithm, victim model and hyperparameter. In view of few works concentrates on it, we comprehensively study the attributability of adversarial examples. Then we propose a multi-task learning framework accompanied by uncertainty weighted loss to solve this problem efficiently. The proposed multi-task adversarial attribution (MTAA) owns scalability to relieve the combination explosion problem in traditional APP solutions. The experiment results on Imagenet and MNIST shows MTAA has the-state-of-art performance.
Acknowledgments
This work was partially supported by National Natural Science Foundation of China (No. 61772284).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 770–778, 2016.
- 2[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In: Communications of the ACM , pages 84–90, 2017.
- 3[3] Long Jonathan, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3431–3440, 2015.
- 4[4] Ronneberger Olaf, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention Society, Springer , pages 234–241, 2015.
- 5[5] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 580–587, 2014.
- 6[6] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems , pages 770–778, 2015.
- 7[7] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations , 2014.
- 8[8] Papernot Nicolas, Mc Daniel Patrick, Jha Somesh, Fredrikson Matt, Celik Z. Berkay, and Swami Ananthram. The limitations of deep learning in adversarial settings. In: Proceedings of the IEEE European Symposium on Security and Privacy , pages 372–387, 2016.
