Adaptive Modeling Against Adversarial Attacks

Zhiwen Yan; Teck Khim Ng

arXiv:2112.12431·cs.LG·December 24, 2021

Adaptive Modeling Against Adversarial Attacks

Zhiwen Yan, Teck Khim Ng

PDF

Open Access 1 Repo

TL;DR

This paper proposes an inference-stage fine-tuning algorithm that enhances the robustness of adversarially trained deep learning models against white-box attacks, significantly improving accuracy on CIFAR10.

Contribution

It introduces a novel post-training method during inference that adapts the model to adversarial inputs using existing data, boosting robustness beyond standard adversarial training.

Findings

01

Robustness against PGD attack improved from 46.8% to 64.5%.

02

Method enhances adversarial defense without retraining from scratch.

03

Significant accuracy gains on CIFAR10 dataset.

Abstract

Adversarial training, the process of training a deep learning model with adversarial data, is one of the most successful adversarial defense methods for deep learning models. We have found that the robustness to white-box attack of an adversarially trained model can be further improved if we fine tune this model in inference stage to adapt to the adversarial input, with the extra information in it. We introduce an algorithm that "post trains" the model at inference stage between the original output class and a "neighbor" class, with existing training data. The accuracy of pre-trained Fast-FGSM CIFAR10 classifier base model against white-box projected gradient attack (PGD) can be significantly improved from 46.8% to 64.5% with our algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jokeryan/post_training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsBalanced Selection