Targeted Downstream-Agnostic Attack
Zhuxin Lei, Ziyuan Yang, Yi Zhang

TL;DR
This paper introduces a targeted downstream-agnostic attack method on pre-trained encoders, using example-specific perturbations and a threat image to reveal vulnerabilities across multiple datasets and models.
Contribution
It proposes a novel targeted DAA approach with example-specific perturbations and a threat image, improving attack success and invisibility under a stricter threat model.
Findings
Effective across 10 self-supervised methods and 3 datasets
High attack success rate and invisibility achieved
Reveals significant vulnerabilities of pre-trained encoders
Abstract
Recently, pre-trained encoders have gained widespread use due to their strong capability in representation extraction. However, they are vulnerable to downstream-agnostic attacks (DAAs). Existing DAA methods operate under a permissive threat model, where an attack is successful if the generated downstream-agnostic adversarial examples (DAEs) change the original prediction, without requiring a specific target. In this paper, we propose a Targeted DAA (TDAA) method under a stricter threat model requiring the attack to be both targeted and downstream-agnostic. Since the downstream task is unknown and encoders do not directly produce predictions, achieving a targeted attack is particularly challenging. To address this, we introduce a novel component termed the 'threat image', pre-selected by the attacker as the target. Specifically, a generator is designed to produce example-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
