Single-Class Target-Specific Attack against Interpretable Deep Learning Systems
Eldor Abdukhamidov, Mohammed Abuhamad, George K. Thiruvathukal,, Hyoungshick Kim, Tamer Abuhmed

TL;DR
This paper introduces SingleADV, a novel universal adversarial attack targeting specific classes in interpretable deep learning models, effectively deceiving models and their explanations while maintaining high confidence and interpretability.
Contribution
The paper proposes SingleADV, a new universal attack method that optimizes perturbations considering both classification and interpretation, effective in white-box and black-box scenarios.
Findings
Achieves an average fooling ratio of 0.74
Generates adversarial samples with a confidence level of 0.78
Effective across multiple model architectures and interpretation methods
Abstract
In this paper, we present a novel Single-class target-specific Adversarial attack called SingleADV. The goal of SingleADV is to generate a universal perturbation that deceives the target model into confusing a specific category of objects with a target category while ensuring highly relevant and accurate interpretations. The universal perturbation is stochastically and iteratively optimized by minimizing the adversarial loss that is designed to consider both the classifier and interpreter costs in targeted and non-targeted categories. In this optimization framework, ruled by the first- and second-moment estimations, the desired loss surface promotes high confidence and interpretation score of adversarial samples. By avoiding unintended misclassification of samples from other categories, SingleADV enables more effective targeted attacks on interpretable deep learning systems in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
