From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks
Thodoris Lymperopoulos, Denia Kanellopoulou

TL;DR
This paper introduces a novel feature attribution method for Fully Connected Neural Networks by perturbing weights instead of features, leading to more reliable explanations and competitive performance.
Contribution
It proposes a new approach to feature attribution through weight perturbation, resulting in two methods, XWP and XWP_c, that improve interpretability of simple DNNs.
Findings
XWP and XWP_c achieve competitive results on baseline metrics.
Weight perturbation offers a new perspective for attribution, mitigating biases.
The methods enhance the robustness and reliability of explanations.
Abstract
Fully Connected Neural Networks (FCNNs) are often regarded as simple and intuitive architectures, yet they serve as the foundation for more complex models. Nonetheless, the lack of consensus on their interpretability continues to pose challenges, underscoring the enduring relevance of simpler, attribution-based approaches for understanding even the most advanced neural architectures. In this regard, we explore a novel idea for estimating feature attribution, by applying perturbation to the features' attached weights instead of their values. This method offers a fresh perspective aimed at mitigating common limitations in Occlusion techniques, such as Added Bias and Out-of-Distribution data. The application of this rule leads to the formation of a pair of novel attribution methods we call XWP and XWP_c. Founded on simple rules, our methods achieve competitive performance in identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
