TL;DR
S-LIME introduces a statistically grounded approach to stabilize local explanations of black-box models, improving reliability and user trust in high-stakes domains.
Contribution
It proposes a hypothesis testing framework based on the central limit theorem to determine the number of perturbations needed for stable explanations.
Findings
S-LIME reduces explanation instability in experiments.
The method guarantees explanation stability with fewer perturbations.
Demonstrates effectiveness on both simulated and real datasets.
Abstract
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLocal Interpretable Model-Agnostic Explanations · High-Order Consensuses
