Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations
Davide Coppola, Hwee Kuan Lee

TL;DR
This paper introduces the Adversarial Intervention framework to analyze how specific feature maps in CNNs contribute to vulnerability against adversarial attacks, providing new insights into model robustness.
Contribution
It proposes a novel framework for studying CNN vulnerabilities at the feature map level, revealing shared vulnerable channels and their impact across different attack types.
Findings
Perturbing shallow layer channels causes significant disruptions.
Vulnerable channel combinations are common across attack types.
A positive correlation exists between kernel magnitude and vulnerability.
Abstract
This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs) with the aim of enhancing the understanding of their underlying mechanisms. Despite numerous defense methods proposed in the literature, there is still an incomplete understanding of this phenomenon. Instead of treating the entire model as vulnerable, we propose that specific feature maps learned during training contribute to the overall vulnerability. To investigate how the hidden representations learned by a CNN affect its vulnerability, we introduce the Adversarial Intervention framework. Experiments were conducted on models trained on three well-known computer vision datasets, subjecting them to attacks of different nature. Our focus centers on the effects that adversarial perturbations to a model's initial layer have on the overall behavior of the model. Empirical results revealed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsFocus
