Explainable AI for Natural Adversarial Images

Tomas Folke; ZhaoBin Li; Ravi B. Sojitra; Scott Cheng-Hsin Yang; and; Patrick Shafto

arXiv:2106.09106·cs.AI·June 18, 2021·1 cites

Explainable AI for Natural Adversarial Images

Tomas Folke, ZhaoBin Li, Ravi B. Sojitra, Scott Cheng-Hsin Yang, and, Patrick Shafto

PDF

Open Access

TL;DR

This paper investigates how explainable AI techniques like saliency maps and example-based explanations can help humans better predict AI errors on adversarial images, improving human oversight of AI decisions.

Contribution

It evaluates the effectiveness of explainable AI methods in aiding humans to anticipate AI mistakes on adversarial and standard images, highlighting saliency maps as particularly effective.

Findings

01

Saliency maps outperform example-based explanations in helping humans predict AI errors.

02

Both explanations improve error detection, but their effects are not additive.

03

Explainable AI methods can enhance human oversight of AI systems.

Abstract

Adversarial images highlight how vulnerable modern image classifiers are to perturbations outside of their training set. Human oversight might mitigate this weakness, but depends on humans understanding the AI well enough to predict when it is likely to make a mistake. In previous work we have found that humans tend to assume that the AI's decision process mirrors their own. Here we evaluate if methods from explainable AI can disrupt this assumption to help participants predict AI classifications for adversarial and standard images. We find that both saliency maps and examples facilitate catching AI errors, but their effects are not additive, and saliency maps are more effective than examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications