Effective Targeted Attacks for Adversarial Self-Supervised Learning
Minseon Kim, Hyeonjeong Ha, Sooel Son, Sung Ju Hwang

TL;DR
This paper introduces a targeted adversarial attack method for self-supervised learning that improves model robustness by selecting and perturbing instances towards similar, confusing targets, especially benefiting non-contrastive SSL frameworks.
Contribution
We propose a positive mining algorithm for targeted adversarial attacks that enhances robustness in self-supervised learning, addressing limitations of untargeted attacks.
Findings
Significant robustness improvements in non-contrastive SSL frameworks.
Moderate but consistent robustness gains in contrastive SSL frameworks.
Effective adversarial examples generated by targeting similar, confusing instances.
Abstract
Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
