Recovering Localized Adversarial Attacks

Jan Philip G\"opfert; Heiko Wersing; Barbara Hammer

arXiv:1910.09239·cs.LG·October 22, 2019

Recovering Localized Adversarial Attacks

Jan Philip G\"opfert, Heiko Wersing, Barbara Hammer

PDF

TL;DR

This paper evaluates the effectiveness of explainability methods in identifying adversarially manipulated regions in images classified by deep neural networks, highlighting LIME's superior performance in such scenarios.

Contribution

The study provides a comparative analysis of explainers' ability to detect adversarial attack regions, emphasizing LIME's effectiveness in this context.

Findings

01

LIME outperforms other explainers in identifying adversarial regions.

02

Explainability methods can be used to detect adversarial attacks.

03

Deep neural network explainability varies significantly across methods.

Abstract

Deep convolutional neural networks have achieved great successes over recent years, particularly in the domain of computer vision. They are fast, convenient, and -- thanks to mature frameworks -- relatively easy to implement and deploy. However, their reasoning is hidden inside a black box, in spite of a number of proposed approaches that try to provide human-understandable explanations for the predictions of neural networks. It is still a matter of debate which of these explainers are best suited for which situations, and how to quantitatively evaluate and compare them. In this contribution, we focus on the capabilities of explainers for convolutional deep neural networks in an extreme situation: a setting in which humans and networks fundamentally disagree. Deep neural networks are susceptible to adversarial attacks that deliberately modify input samples to mislead a neural network's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLocal Interpretable Model-Agnostic Explanations