Using Deep Image Priors to Generate Counterfactual Explanations
Vivek Narayanaswamy, Jayaraman J. Thiagarajan, Andreas Spanias

TL;DR
This paper introduces a novel regularization method using an auxiliary loss to improve deep image priors for generating meaningful counterfactual explanations in image classification, demonstrated on skin lesion detection.
Contribution
It proposes a new regularization strategy with an auxiliary loss estimator to guide deep image priors for more interpretable counterfactual image generation.
Findings
The proposed method produces more natural and meaningful counterfactuals.
Standard DIP often generates imperceptible and irrelevant perturbations.
Empirical results on skin lesion data validate the approach's effectiveness.
Abstract
Through the use of carefully tailored convolutional neural network architectures, a deep image prior (DIP) can be used to obtain pre-images from latent representation encodings. Though DIP inversion has been known to be superior to conventional regularized inversion strategies such as total variation, such an over-parameterized generator is able to effectively reconstruct even images that are not in the original data distribution. This limitation makes it challenging to utilize such priors for tasks such as counterfactual reasoning, wherein the goal is to generate small, interpretable changes to an image that systematically leads to changes in the model prediction. To this end, we propose a novel regularization strategy based on an auxiliary loss estimator jointly trained with the predictor, which efficiently guides the prior to recover natural pre-images. Our empirical studies with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
