TL;DR
IG2 introduces a novel feature attribution method that considers both explicand and counterfactual gradients iteratively, improving explanation accuracy and robustness over existing Integrated Gradients variants.
Contribution
The paper proposes IG2, a new path-based attribution method that incorporates counterfactual gradients iteratively, addressing noise and baseline issues in traditional Integrated Gradients.
Findings
IG2 outperforms state-of-the-art attribution methods on multiple benchmarks.
IG2 effectively reduces attribution noise and baseline arbitrariness.
Experimental validation on diverse datasets confirms IG2's superior explanation quality.
Abstract
Feature attribution explains Artificial Intelligence (AI) at the instance level by providing importance scores of input features' contributions to model prediction. Integrated Gradients (IG) is a prominent path attribution method for deep neural networks, involving the integration of gradients along a path from the explained input (explicand) to a counterfactual instance (baseline). Current IG variants primarily focus on the gradient of explicand's output. However, our research indicates that the gradient of the counterfactual output significantly affects feature attribution as well. To achieve this, we propose Iterative Gradient path Integrated Gradients (IG2), considering both gradients. IG2 incorporates the counterfactual gradient iteratively into the integration path, generating a novel path (GradPath) and a novel baseline (GradCF). These two novel IG components effectively address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
