When Can You Trust Your Explanations? A Robustness Analysis on Feature Importances
Ilaria Vascotto, Alex Rodriguez, Alessandro Bonaita, Luca Bortolussi

TL;DR
This paper introduces a framework for assessing the robustness of neural network explanations to non-adversarial perturbations, emphasizing the importance of trustworthy explanations in AI systems, especially for tabular data.
Contribution
It proposes a novel method leveraging the manifold hypothesis for generating perturbed data and an ensemble approach to improve explanation robustness evaluation.
Findings
Robust explanations are crucial for trustworthy AI.
Ensemble explanations enhance robustness assessment.
Experimental results show the effectiveness on tabular datasets.
Abstract
Recent legislative regulations have underlined the need for accountable and transparent artificial intelligence systems and have contributed to a growing interest in the Explainable Artificial Intelligence (XAI) field. Nonetheless, the lack of standardized criteria to validate explanation methodologies remains a major obstacle to developing trustworthy systems. We address a crucial yet often overlooked aspect of XAI, the robustness of explanations, which plays a central role in ensuring trust in both the system and the provided explanation. To this end, we propose a novel approach to analyse the robustness of neural network explanations to non-adversarial perturbations, leveraging the manifold hypothesis to produce new perturbed datapoints that resemble the observed data distribution. We additionally present an ensemble method to aggregate various explanations, showing how merging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Industrial Vision Systems and Defect Detection · Machine Learning and Data Classification
