Evaluating Local Explanations using White-box Models
Amir Hossein Akhavan Rahnama, Judith Butepage, Pierre Geurts, Henrik, Bostrom

TL;DR
This paper proposes a new method to evaluate local explanation techniques by comparing their similarity to the log odds ratio decomposition in models like logistic regression, providing a more objective benchmark.
Contribution
It introduces a novel benchmarking approach for local explanations based on the log odds ratio, applicable to models with additive log-odds decompositions.
Findings
Explanation performance varies with model type and dataset.
Normalization and similarity metrics influence explanation evaluation.
Benchmarking reveals differences among explanation techniques.
Abstract
Evaluating explanation techniques using human subjects is costly, time-consuming and can lead to subjectivity in the assessments. To evaluate the accuracy of local explanations, we require access to the true feature importance scores for a given instance. However, the prediction function of a model usually does not decompose into linear additive terms that indicate how much a feature contributes to the output. In this work, we suggest to instead focus on the log odds ratio (LOR) of the prediction function, which naturally decomposes into additive terms for logistic regression and naive Bayes. We demonstrate how we can benchmark different explanation techniques in terms of their similarity to the LOR scores based on our proposed approach. In the experiments, we compare prominent local explanation techniques and find that the performance of the techniques can depend on the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
MethodsLogistic Regression
