Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

Ziqi Zhou; Minghui Li; Wei Liu; Shengshan Hu; Yechao Zhang; Wei Wan,; Lulu Xue; Leo Yu Zhang; Dezhong Yao; Hai Jin

arXiv:2403.10801·cs.CV·March 20, 2024·2 cites

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan,, Lulu Xue, Leo Yu Zhang, Dezhong Yao, Hai Jin

PDF

Open Access 1 Repo

TL;DR

This paper introduces Gen-AF, a two-stage adversarial fine-tuning method that significantly improves the robustness of pre-trained encoders against adversarial examples across multiple datasets and training methods.

Contribution

We propose Gen-AF, a novel genetic evolution-based adversarial fine-tuning approach that enhances downstream model robustness against DAEs, addressing limitations of existing defenses.

Findings

01

Gen-AF achieves high testing accuracy on six datasets.

02

Gen-AF significantly improves robustness against state-of-the-art DAEs.

03

The method is effective across ten self-supervised training methods.

Abstract

With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cgcl-codes/gen-af
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques