Heat and Blur: An Effective and Fast Defense Against Adversarial Examples
Haya Brama, Tal Grinshpoun

TL;DR
This paper introduces a fast, model-agnostic defense against adversarial examples that uses feature visualization and blurring to mitigate attacks, validated on ImageNet with VGG19.
Contribution
It proposes a novel, simple defense method combining feature visualization and input blurring applicable to pre-trained networks, along with a new evaluation metric.
Findings
Effective against various adversarial attacks
Applicable to pre-trained models without retraining
Improves robustness as shown on ImageNet with VGG19
Abstract
The growing incorporation of artificial neural networks (NNs) into many fields, and especially into life-critical systems, is restrained by their vulnerability to adversarial examples (AEs). Some existing defense methods can increase NNs' robustness, but they often require special architecture or training procedures and are irrelevant to already trained models. In this paper, we propose a simple defense that combines feature visualization with input modification, and can, therefore, be applicable to various pre-trained networks. By reviewing several interpretability methods, we gain new insights regarding the influence of AEs on NNs' computation. Based on that, we hypothesize that information about the "true" object is preserved within the NN's activity, even when the input is adversarial, and present a feature visualization version that can extract that information in the form of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
MethodsInterpretability
