Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters Substitution
MingWei Zhou, Xiaobing Pei

TL;DR
This paper introduces a novel approach to generate unrestricted adversarial samples by exploiting non-semantic feature clusters, effectively fooling classifiers without altering image semantics.
Contribution
It presents a new method for creating adversarial examples based on non-semantic feature clusters learned from model training, moving beyond traditional $L_p$ norm constraints.
Findings
Effective in fooling classifiers in black-box and white-box settings
Adversarial samples preserve image semantics
Outperforms norm-based adversarial methods
Abstract
Most current methods generate adversarial examples with the norm specification. As a result, many defense methods utilize this property to eliminate the impact of such attacking algorithms. In this paper,we instead introduce "unrestricted" perturbations that create adversarial samples by using spurious relations which were learned by model training. Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results, and treat them as spurious relations learned by the model. Then we create adversarial samples by using them to replace the corresponding feature clusters in the target image. Experimental evaluations show that in both black-box and white-box situations. Our adversarial examples do not change the semantics of images, while still being effective at fooling an adversarially trained DNN image classifier.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
