Single-Class Target-Specific Attack against Interpretable Deep Learning   Systems

Eldor Abdukhamidov; Mohammed Abuhamad; George K. Thiruvathukal,; Hyoungshick Kim; Tamer Abuhmed

arXiv:2307.06484·cs.CV·July 14, 2023

Single-Class Target-Specific Attack against Interpretable Deep Learning Systems

Eldor Abdukhamidov, Mohammed Abuhamad, George K. Thiruvathukal,, Hyoungshick Kim, Tamer Abuhmed

PDF

Open Access 1 Repo

TL;DR

This paper introduces SingleADV, a novel universal adversarial attack targeting specific classes in interpretable deep learning models, effectively deceiving models and their explanations while maintaining high confidence and interpretability.

Contribution

The paper proposes SingleADV, a new universal attack method that optimizes perturbations considering both classification and interpretation, effective in white-box and black-box scenarios.

Findings

01

Achieves an average fooling ratio of 0.74

02

Generates adversarial samples with a confidence level of 0.78

03

Effective across multiple model architectures and interpretation methods

Abstract

In this paper, we present a novel Single-class target-specific Adversarial attack called SingleADV. The goal of SingleADV is to generate a universal perturbation that deceives the target model into confusing a specific category of objects with a target category while ensuring highly relevant and accurate interpretations. The universal perturbation is stochastically and iteratively optimized by minimizing the adversarial loss that is designed to consider both the classifier and interpreter costs in targeted and non-targeted categories. In this optimization framework, ruled by the first- and second-moment estimations, the desired loss surface promotes high confidence and interpretation score of adversarial samples. By avoiding unintended misclassification of samples from other categories, SingleADV enables more effective targeted attacks on interpretable deep learning systems in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

infolab-skku/singleclassadv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications