Scalable Attribution of Adversarial Attacks via Multi-Task Learning

Zhongyi Guo; Keji Han; Yao Ge; Wei Ji; Yun Li

arXiv:2302.14059·cs.LG·March 1, 2023

Scalable Attribution of Adversarial Attacks via Multi-Task Learning

Zhongyi Guo, Keji Han, Yao Ge, Wei Ji, Yun Li

PDF

Open Access

TL;DR

This paper introduces a multi-task learning framework called MTAA for scalable attribution of adversarial attacks, recognizing attack algorithm, victim model, and hyperparameters simultaneously to improve defense insights.

Contribution

The paper proposes a novel multi-task learning approach for adversarial attribution that considers relationships between attack signatures, addressing limitations of single-label classification methods.

Findings

01

MTAA effectively recognizes attack signatures on MNIST and ImageNet.

02

The framework handles false alarms and improves attribution accuracy.

03

Scalability demonstrated with multiple attack signatures.

Abstract

Deep neural networks (DNNs) can be easily fooled by adversarial attacks during inference phase when attackers add imperceptible perturbations to original examples, i.e., adversarial examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which can help defenders to seize the clues about the originator of the attack, their goals, and provide insight into the most effective defense algorithm against corresponding attacks. With such a gap, it is necessary to develop techniques that can recognize tool-chains that are leveraged to generate the adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., {\em attack algorithm}, {\em victim model} and {\em hyperparameter}.…

Tables14

Table 1. Table 1: The attribution scenario of adversarial attribution.

Attack Algorithm

Hyperparameter

Victim Model

FGSM(

L_{\infty}

)

ε

: 10/255-200/255(10/255)

InceptionV3 ResNet18 ResNet50 VGG16 VGG19

PGD(

L_{\infty}

)

ε

: 10/255-200/255(10/255)

α

: 10/255

s ​ t ​ e ​ p

: 100

C&W(

L_{2}

)

κ

: 5-100(5)

C

: 50

s ​ t ​ e ​ p

: 500

Table 2. Table 2: Results ( % percent \% ) for Attack Algorithm+Victim Model 15 classification task with ResNet50 as classifier on MNIST.

	InceptionV3	ResNet18	ResNet50	VGG16	VGG19	Average
FGSM	100.00	100.00	100.00	100.00	100.00	100.00
PGD	100.00	97.00	100.00	99.75	99.25	99.20
C $&$ W	99.75	99.50	100.00	100.00	99.75	99.80
Average	99.92	98.83	100.00	99.92	99.67	99.67

Table 3. Table 3: Results ( % percent \% ) for Attack Algorithm+Victim Model 15 classification task with ResNet101 as classifier on ImageNet.

	InceptionV3	ResNet18	ResNet50	VGG16	VGG19	Average
FGSM	99.50	99.25	99.00	99.50	92.50	97.95
PGD	99.00	98.00	98.50	95.25	98.00	97.75
C $&$ W	99.75	98.50	98.00	98.75	95.00	98.00
Average	99.42	98.58	98.50	97.83	95.17	97.90

Table 4. Table 4: Results ( % percent \% ) for Attack Algorithm+Victim Model+ Hyperparameter 300 classification task with ResNet50 as classifier on MNIST.

	InceptionV3	ResNet18	ResNet50	VGG16	VGG19	Average
FGSM	97.25	99.50	99.75	99.75	98.00	98.85
PGD	70.00	86.50	87.00	78.50	82.50	80.90
C $&$ W	13.00	14.75	16.00	12.00	7.75	12.70
Average	60.08	66.92	67.58	63.42	62.75	64.15

Table 5. Table 5: Results ( % percent \% ) for Attack Algorithm+Victim Model+ Hyperparameter 300 classification task with ResNet101 as classifier on ImageNet.

	InceptionV3	ResNet18	ResNet50	VGG16	VGG19	Average
FGSM	96.25	89.75	97.00	92.50	95.25	94.15
PGD	71.25	57.50	60.75	66.25	61.50	63.45
C $&$ W	26.00	25.00	28.75	23.50	27.00	26.05
Average	64.50	57.42	62.17	60.75	61.25	61.22

Table 6. Table 6: Architecture of Auto-Encoder.

Structure Name	Output Size	Architecture
Encoder	8*8	[2*2, 512], stride 2, padding 1
	4*4	2*2 maxpooling, stride 2
	3*3	[2*2, 256], stride 2, padding 1
	2*2	2*2 maxpooling, stride 1
Decoder	5*5	[3*3, 512], stride 2
	11*11	[5*5, 256], stride 2, padding 1
	14*14	[6*6, 256], stride 1, padding 1

Table 7. Table 7: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) of [ 21 ] , [ 22 ] as single task baseline and MTAA on MNIST. The first two rows are the results of [ 21 ] and [ 22 ] s’ backbone trained individually for three signatures. The third row is performance of our MTAA.

	Backbone	Model	FLOPS(G)	Params(M)	Attack Algorithm(%) $↑$	Victim Model(%) $↑$	Hyperparameter(RMSE) $↓$	$Δ_{M T L} (%) ↑$
Single Task	Self-built	[21]	-	-	95.36	93.47	9.64	+0.00
Single Task	ResNet-50	[22]	0.97	71	100	99.81	6.46	+0.00
MTL	ResNet-50	MTAA	0.77	48	100	99.88	6.04	+1.17

Table 8. Table 8: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) of [ 21 ] , [ 22 ] , ResNet101 as single task baseline and MTAA on ImageNet. The first three rows are the results of [ 21 ] , [ 22 ] and ResNet101s’ backbone trained individually for three signatures. The last two rows are corresponding MTAA performance.

	Backbone	Model	FLOPS(G)	Params(M)	Attack Algorithm(%) $↑$	Victim Model(%) $↑$	Hyperparameter(RMSE) $↓$	$Δ_{M T L} (%) ↑$
Single Task	Self-built	[21]	-	-	88.54	83.22	12.97	+0.00
	ResNet-50	[22]	12	71	97.43	93.25	7.93	+0.00
	ResNet-101		24	128	98.68	94.72	7.33	+0.00
MTL	ResNet-50	MTAA	9	48	99.68	96.95	7.32	+4.66
MTL	ResNet-101	MTAA	21	108	99.78	97.84	6.79	+3.93

Table 9. Table 9: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) for 2 attack algorithms, 3 victim models and 3 attack algorithms, 5 victim models on MNIST. Our pre-experiment and MTAA both use ResNet50 as backbone. [ 21 ] , [ 22 ] and our pre-experiment solve Attack+Victim+Hyperparameter single-label classification problem, thus do not have result of RMSE for regression. MTAA has the classification accuracy for attack algorithm and victim model recognition as well as regression RMSE for hyperparameter recognition. Note that [ 22 ] and our pre-experiment use same backbone on MNIST, thus get the same results.

Attack Algorithms

Victim Models

[21]

[22]

Pre-experiment

MTAA

Attack Algorithm+Victim Model+Hyperparameter

Attack Algorithm

Victim Model

Hyperparameter

FGSM

PGD

C&W

InceptionV3

ResNet18

ResNet50

VGG16

VGG19

57.53

64.15

100

99.88

6.04

FGSM

PGD

InceptionV3

ResNet18

VGG16

83.74

92.21

100

3.72

FGSM

C&W

InceptionV3

ResNet18

VGG16

55.76

68.33

100

99.96

5.71

FGSM

PGD

InceptionV3

ResNet50

VGG19

85.54

91.71

100

2.06

PGD

C&W

InceptionV3

ResNet50

VGG19

50.06

56.62

100

99.96

7.02

Table 10. Table 10: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) for 2 attack algorithms, 3 victim models and 3 attack algorithms, 5 victim models on ImageNet. Our pre-experiment and MTAA both use ResNet101 as backbone. [ 21 ] , [ 22 ] and our pre-experiment solve Attack+Victim+Hyperparameter single-label classification problem, thus do not have result of RMSE for regression. MTAA has the classification accuracy for attack algorithm and victim model recognition as well as regression RMSE for hyperparameter recognition.

Attack Algorithms

Victim Models

[21]

[22]

Pre-experiment

MTAA

Attack Algorithm+Victim Model+Hyperparameter

Attack Algorithm

Victim Model

Hyperparameter

FGSM

PGD

C&W

InceptionV3

ResNet18

ResNet50

VGG16

VGG19

50.71

59.73

61.22

99.78

97.84

6.79

FGSM

PGD

InceptionV3

ResNet18

VGG16

71.26

78.53

82.96

99.97

98.93

5.96

FGSM

C&W

InceptionV3

ResNet18

VGG16

49.98

59.42

62.33

99.97

98.24

7.76

FGSM

PGD

InceptionV3

ResNet50

VGG19

63.76

71.32

84.07

99.93

98.69

6.02

PGD

C&W

InceptionV3

ResNet50

VGG19

39.98

47.21

48.04

99.97

98.21

8.01

Table 11. Table 11: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) of false alarms on our MTAA and single task on MNIST.

	Backbone	Model	FLOPS(G)	Params(M)	Attack Algorithm(%) $↑$	Victim Model(%) $↑$	Hyperparameter(RMSE) $↓$	$Δ_{M T L} (%) ↑$
Single Task	ResNet-50		12	71	100	99.89	7.01	+0.00
MTL	ResNet-50	MTAA	9	48	100	99.98	6.51	+2.41

Table 12. Table 12: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) of false alarms on our MTAA and single task on ImageNet.

	Backbone	Model	FLOPS(G)	Params(M)	Attack Algorithm(%) $↑$	Victim Model(%) $↑$	Hyperparameter(rmse) $↓$	$Δ_{M T L} (%) ↑$
Single Task	ResNet-101		24	128	97.58	94.45	11.23	+0.00
MTL	ResNet-101	MTAA	21	108	99.81	96.41	7.21	+13.39

Table 13. Table 13: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) of ablation study on different model architecture of MTAA on MNIST.

Architecture of MTAA	Attack Algorithm	Victim Model	Hyperparameter
ResNet18+simple add loss	98.92	84.72	7.88
ResNet18+Uncertainty loss weight	99.54	97.84	7.21
ResNet18+Uncertainty loss weight+TSL	99.79	98.37	7.02
ResNet50+Uncertainty loss weight+TSL	99.94	99.26	6.42
ResNet50+Uncertainty loss weight+TSL+PE	100	99.88	6.04

Table 14. Table 14: Results ( % percent \% / R M S E 𝑅 𝑀 𝑆 𝐸 RMSE ) of ablation study on different model architecture of MTAA on ImageNet.

Architecture of MTAA	Attack Algorithm	Victim Model	Hyperparameter
ResNet50+simple add loss	98.12	70.82	8.7
ResNet50+Uncertainty loss weight	99.3	95.98	7.9
ResNet50+Uncertainty loss weight+TSL	99.48	96.1	7.81
ResNet101+Uncertainty loss weight+TSL	99.61	97.13	7.25
ResNet101+Uncertainty loss weight+TSL+PE	99.78	97.84	6.79

Equations30

x^{'} = x + ε \cdot sign (\nabla_{x} ℓ (h (x; θ), u))

x^{'} = x + ε \cdot sign (\nabla_{x} ℓ (h (x; θ), u))

x_{t + 1}^{'} = Clip_{x, ε} (x_{t}^{'} + α \cdot sign (\nabla_{x} ℓ (x, u; θ)))

x_{t + 1}^{'} = Clip_{x, ε} (x_{t}^{'} + α \cdot sign (\nabla_{x} ℓ (x, u; θ)))

min ∥ ρ ∥_{p} + e \cdot t (x + ρ), s.t. x + ρ \in [0, 1]^{m} .

min ∥ ρ ∥_{p} + e \cdot t (x + ρ), s.t. x + ρ \in [0, 1]^{m} .

L_{M S E_{1}} = \frac{1}{n} j \sum n (x_{j}^{f m^{'}} - G (R (x_{j}^{f m^{'}})) - x_{j}^{f m})^{2} .

L_{M S E_{1}} = \frac{1}{n} j \sum n (x_{j}^{f m^{'}} - G (R (x_{j}^{f m^{'}})) - x_{j}^{f m})^{2} .

L_{C E_{1}} = \frac{1}{n} j \sum t_{1} = 1 \sum m_{1} Q_{j}^{t_{1}} lo g (P_{j}^{t_{1}}),

L_{C E_{1}} = \frac{1}{n} j \sum t_{1} = 1 \sum m_{1} Q_{j}^{t_{1}} lo g (P_{j}^{t_{1}}),

L_{C E_{2}} = \frac{1}{n} j \sum t_{2} = 1 \sum m_{2} Q_{j}^{t_{2}} lo g (P_{j}^{t_{2}}) .

L_{C E_{2}} = \frac{1}{n} j \sum t_{2} = 1 \sum m_{2} Q_{j}^{t_{2}} lo g (P_{j}^{t_{2}}) .

Y = v τ + z γ + δ .

Y = v τ + z γ + δ .

L_{M S E_{2}} = \frac{1}{m _{3}} j \sum m_{3} (\hat{Y}_{j} - \overline{Y}_{j})^{2} .

L_{M S E_{2}} = \frac{1}{m _{3}} j \sum m_{3} (\hat{Y}_{j} - \overline{Y}_{j})^{2} .

p (y ∣ f^{W} (x), σ_{1}) = N (f^{W} (x), σ_{1}^{2}),

p (y ∣ f^{W} (x), σ_{1}) = N (f^{W} (x), σ_{1}^{2}),

p (y ∣ f^{W} (x), σ_{2}) = so f t ma x (\frac{1}{σ _{2}^{2}} f^{W} (x)) .

p (y ∣ f^{W} (x), σ_{2}) = so f t ma x (\frac{1}{σ _{2}^{2}} f^{W} (x)) .

p (s_{1}, \dots, s_{k} ∣ f^{W} (x)) = p (s_{1} ∣ f^{W} (x)) \dots p (s_{k} ∣ f^{W} (x)) .

p (s_{1}, \dots, s_{k} ∣ f^{W} (x)) = p (s_{1} ∣ f^{W} (x)) \dots p (s_{k} ∣ f^{W} (x)) .

lo g p (y ∣ f^{W} (x), σ_{1}) \propto - \frac{1}{2 σ _{1}^{2}} y - f^{W} (x)^{2} - lo g σ_{1},

lo g p (y ∣ f^{W} (x), σ_{1}) \propto - \frac{1}{2 σ _{1}^{2}} y - f^{W} (x)^{2} - lo g σ_{1},

lo g p (y ∣ f^{W} (x), σ_{2}) = \frac{1}{σ _{2}^{2}} f_{c}^{W} (x) - lo g c^{'} \sum e x p (\frac{1}{σ _{2}^{2}} f_{c^{'}}^{W} (x)) .

lo g p (y ∣ f^{W} (x), σ_{2}) = \frac{1}{σ _{2}^{2}} f_{c}^{W} (x) - lo g c^{'} \sum e x p (\frac{1}{σ _{2}^{2}} f_{c^{'}}^{W} (x)) .

L (W, σ_{1}, σ_{2}, σ_{3}) = - lo g p (y_{1}, y_{2}, y_{3} = c ∣ f^{W} (x)) = - lo g N (y_{1}; f^{W} (x), σ_{1}^{2}) \cdot so f t ma x (y_{2} = c; f^{W} (x), σ_{2}) \cdot so f t ma x (y_{3} = c; f^{W} (x), σ_{3}) = \frac{1}{2 σ _{1}^{2}} y_{1} - f^{W} (x)^{2} + lo g σ_{1} - lo g p (y_{2} = c ∣ f^{W} (x), σ_{2}) - lo g p (y_{3} = c ∣ f^{W} (x), σ_{3}) = \frac{1}{2 σ _{1}^{2}} L_{1} (W) + \frac{1}{σ _{2}^{2}} L_{2} (W) + \frac{1}{σ _{3}^{2}} L_{3} (W) + lo g σ_{1} + lo g \frac{c ^{'} \sum e x p ( \frac{1}{σ _{2}^{2}} f _{c^{'}}^{W} ( x ) )}{( c ^{'} \sum e x p ( f _{c^{'}}^{W} ( x ) ) ) ^{\frac{1}{σ _{2}^{2}}}} + l o g \frac{c ^{'} \sum e x p ( \frac{1}{σ _{3}^{2}} f _{c^{'}}^{W} ( x ) )}{( c ^{'} \sum e x p ( f _{c^{'}}^{W} ( x ) ) ) ^{\frac{1}{σ _{3}^{2}}}} \approx \frac{1}{2 σ _{1}^{2}} L_{1} (W) + \frac{1}{σ _{2}^{2}} L_{2} (W) + \frac{1}{σ _{3}^{2}} L_{3} (W) + lo g σ_{1} + lo g σ_{2} + lo g σ_{3}

L (W, σ_{1}, σ_{2}, σ_{3}) = - lo g p (y_{1}, y_{2}, y_{3} = c ∣ f^{W} (x)) = - lo g N (y_{1}; f^{W} (x), σ_{1}^{2}) \cdot so f t ma x (y_{2} = c; f^{W} (x), σ_{2}) \cdot so f t ma x (y_{3} = c; f^{W} (x), σ_{3}) = \frac{1}{2 σ _{1}^{2}} y_{1} - f^{W} (x)^{2} + lo g σ_{1} - lo g p (y_{2} = c ∣ f^{W} (x), σ_{2}) - lo g p (y_{3} = c ∣ f^{W} (x), σ_{3}) = \frac{1}{2 σ _{1}^{2}} L_{1} (W) + \frac{1}{σ _{2}^{2}} L_{2} (W) + \frac{1}{σ _{3}^{2}} L_{3} (W) + lo g σ_{1} + lo g \frac{c ^{'} \sum e x p ( \frac{1}{σ _{2}^{2}} f _{c^{'}}^{W} ( x ) )}{( c ^{'} \sum e x p ( f _{c^{'}}^{W} ( x ) ) ) ^{\frac{1}{σ _{2}^{2}}}} + l o g \frac{c ^{'} \sum e x p ( \frac{1}{σ _{3}^{2}} f _{c^{'}}^{W} ( x ) )}{( c ^{'} \sum e x p ( f _{c^{'}}^{W} ( x ) ) ) ^{\frac{1}{σ _{3}^{2}}}} \approx \frac{1}{2 σ _{1}^{2}} L_{1} (W) + \frac{1}{σ _{2}^{2}} L_{2} (W) + \frac{1}{σ _{3}^{2}} L_{3} (W) + lo g σ_{1} + lo g σ_{2} + lo g σ_{3}

Δ_{M T L} = \frac{1}{T} k = 1 \sum T (- 1)^{o_{k}} (M_{f, k} - M_{B, k}) / M_{B, k} .

Δ_{M T L} = \frac{1}{T} k = 1 \sum T (- 1)^{o_{k}} (M_{f, k} - M_{B, k}) / M_{B, k} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

Full text

Scalable Attribution of Adversarial Attacks via Multi-Task Learning

Zhongyi Guo