Explainable AI for Natural Adversarial Images
Tomas Folke, ZhaoBin Li, Ravi B. Sojitra, Scott Cheng-Hsin Yang, and, Patrick Shafto

TL;DR
This paper investigates how explainable AI techniques like saliency maps and example-based explanations can help humans better predict AI errors on adversarial images, improving human oversight of AI decisions.
Contribution
It evaluates the effectiveness of explainable AI methods in aiding humans to anticipate AI mistakes on adversarial and standard images, highlighting saliency maps as particularly effective.
Findings
Saliency maps outperform example-based explanations in helping humans predict AI errors.
Both explanations improve error detection, but their effects are not additive.
Explainable AI methods can enhance human oversight of AI systems.
Abstract
Adversarial images highlight how vulnerable modern image classifiers are to perturbations outside of their training set. Human oversight might mitigate this weakness, but depends on humans understanding the AI well enough to predict when it is likely to make a mistake. In previous work we have found that humans tend to assume that the AI's decision process mirrors their own. Here we evaluate if methods from explainable AI can disrupt this assumption to help participants predict AI classifications for adversarial and standard images. We find that both saliency maps and examples facilitate catching AI errors, but their effects are not additive, and saliency maps are more effective than examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
