Interpreting Undesirable Pixels for Image Classification on Black-Box Models
Sin-Han Kang, Hong-Gyu Jung, and Seong-Whan Lee

TL;DR
This paper introduces a novel explanation method for black-box image classifiers that visualizes undesirable pixels which interfere with correct classification, providing both qualitative heatmaps and quantitative metrics.
Contribution
It proposes a new approach to interpret and visualize pixels that negatively impact black-box model predictions, focusing on both target and non-target class interference.
Findings
Visualizes undesirable regions on heatmaps for qualitative analysis.
Provides a new evaluation metric for quantitative assessment on ImageNet.
Enhances understanding of factors that hinder accurate image classification.
Abstract
In an effort to interpret black-box models, researches for developing explanation methods have proceeded in recent years. Most studies have tried to identify input pixels that are crucial to the prediction of a classifier. While this approach is meaningful to analyse the characteristic of blackbox models, it is also important to investigate pixels that interfere with the prediction. To tackle this issue, in this paper, we propose an explanation method that visualizes undesirable regions to classify an image as a target class. To be specific, we divide the concept of undesirable regions into two terms: (1) factors for a target class, which hinder that black-box models identify intrinsic characteristics of a target class and (2) factors for non-target classes that are important regions for an image to be classified as other classes. We visualize such undesirable regions on heatmaps to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
