Adaptive Modeling Against Adversarial Attacks
Zhiwen Yan, Teck Khim Ng

TL;DR
This paper proposes an inference-stage fine-tuning algorithm that enhances the robustness of adversarially trained deep learning models against white-box attacks, significantly improving accuracy on CIFAR10.
Contribution
It introduces a novel post-training method during inference that adapts the model to adversarial inputs using existing data, boosting robustness beyond standard adversarial training.
Findings
Robustness against PGD attack improved from 46.8% to 64.5%.
Method enhances adversarial defense without retraining from scratch.
Significant accuracy gains on CIFAR10 dataset.
Abstract
Adversarial training, the process of training a deep learning model with adversarial data, is one of the most successful adversarial defense methods for deep learning models. We have found that the robustness to white-box attack of an adversarially trained model can be further improved if we fine tune this model in inference stage to adapt to the adversarial input, with the extra information in it. We introduce an algorithm that "post trains" the model at inference stage between the original output class and a "neighbor" class, with existing training data. The accuracy of pre-trained Fast-FGSM CIFAR10 classifier base model against white-box projected gradient attack (PGD) can be significantly improved from 46.8% to 64.5% with our algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsBalanced Selection
