Counterfactual Explanation Based on Gradual Construction for Deep Networks
Hong-Gyu Jung, Sin-Han Kang, Hee-Dong Kim, Dong-Ok Won, Seong-Whan Lee

TL;DR
This paper introduces a novel counterfactual explanation method for deep networks that gradually constructs explanations by iteratively selecting and optimizing features, resulting in clearer, more human-friendly interpretations aligned with training data distributions.
Contribution
The proposed method uniquely combines masking and composition steps to produce more realistic and understandable counterfactual explanations based on training data statistics.
Findings
Produces human-friendly interpretations across various datasets
Achieves explanations with fewer feature modifications
Verifies alignment with training data distribution
Abstract
To understand the black-box characteristics of deep networks, counterfactual explanation that deduces not only the important features of an input space but also how those features should be modified to classify input as a target class has gained an increasing interest. The patterns that deep networks have learned from a training dataset can be grasped by observing the feature variation among various classes. However, current approaches perform the feature modification to increase the classification probability for the target class irrespective of the internal characteristics of deep networks. This often leads to unclear explanations that deviate from real-world data distributions. To address this problem, we propose a counterfactual explanation method that exploits the statistics learned from a training dataset. Especially, we gradually construct an explanation by iterating over masking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
