Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye

TL;DR
This paper introduces FreeMCG, a derivative-free, manifold-constrained gradient approximation method that improves explainability of neural networks by requiring only output access and producing more faithful, human-aligned attributions.
Contribution
The paper presents FreeMCG, a novel derivative-free approach using ensemble Kalman filters and diffusion models for better model explanations on the data manifold.
Findings
Achieves state-of-the-art results in counterfactual generation.
Provides more faithful and human-aligned feature attributions.
Requires only model output access, enhancing black-box applicability.
Abstract
Gradient-based methods are a prototypical family of explainability techniques, especially for image-based models. Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align well with human perception. To overcome these challenges, we introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG), a novel method that serves as an improved basis for explainability of a given neural network than the traditional gradient. Specifically, by leveraging ensemble Kalman filters and diffusion models, we derive a derivative-free approximation of the model's gradient projected onto the data manifold, requiring access only to the model's outputs. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Numerical methods in inverse problems · Model Reduction and Neural Networks
MethodsDiffusion · ALIGN
