Downstream-agnostic Adversarial Examples
Ziqi Zhou, Shengshan Hu, Ruizhi Zhao, Qian Wang, Leo Yu Zhang, Junhui, Hou, Hai Jin

TL;DR
This paper introduces AdvEncoder, a novel framework for creating universal adversarial examples that can fool various downstream tasks using a pre-trained encoder, highlighting security vulnerabilities in self-supervised models.
Contribution
AdvEncoder is the first method to generate downstream-agnostic universal adversarial examples targeting pre-trained encoders, enhancing understanding of model security risks.
Findings
Successfully attacks downstream tasks without dataset knowledge
High transferability of adversarial perturbations
Four defenses tested against AdvEncoder
Abstract
Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning operations to enjoy the benefit of "large model". Despite this promising prospect, the security of pre-trained encoder has not been thoroughly investigated yet, especially when the pre-trained encoder is publicly available for commercial use. In this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. AdvEncoder aims to construct a universal adversarial perturbation or patch for a set of natural images that can fool all the downstream tasks inheriting the victim pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
