Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples
Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan,, Lulu Xue, Leo Yu Zhang, Dezhong Yao, Hai Jin

TL;DR
This paper introduces Gen-AF, a two-stage adversarial fine-tuning method that significantly improves the robustness of pre-trained encoders against adversarial examples across multiple datasets and training methods.
Contribution
We propose Gen-AF, a novel genetic evolution-based adversarial fine-tuning approach that enhances downstream model robustness against DAEs, addressing limitations of existing defenses.
Findings
Gen-AF achieves high testing accuracy on six datasets.
Gen-AF significantly improves robustness against state-of-the-art DAEs.
The method is effective across ten self-supervised training methods.
Abstract
With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques
