Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

In\^es Oliveira e Silva; S\'ergio Jesus; Iker Perez; Rita P. Ribeiro; Carlos Soares; Hugo Ferreira; Pedro Bizarro

arXiv:2604.22662·cs.LG·April 27, 2026

Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

In\^es Oliveira e Silva, S\'ergio Jesus, Iker Perez, Rita P. Ribeiro, Carlos Soares, Hugo Ferreira, Pedro Bizarro

PDF

TL;DR

This paper critically examines how current Shapley value explanations for AI are evaluated, revealing a disconnect between quantitative metrics and human decision-making in high-stakes environments.

Contribution

It introduces a unified framework to compare Shapley variants and demonstrates that existing metrics do not align with human utility or decision confidence.

Findings

01

Quantitative metrics are decoupled from human-perceived clarity.

02

Explanations increase decision confidence but do not improve objective performance.

03

Current evaluation proxies are insufficient for predicting human impact.

Abstract

Shapley values are a cornerstone of explainable AI, yet their proliferation into competing formulations has created a fragmented landscape with little consensus on practical deployment. While theoretical differences are well-documented, evaluation remains reliant on quantitative proxies whose alignment with human utility is unverified. In this work, we use a unified amortized framework to isolate semantic differences between eight Shapley variants under the low-latency constraints of operational risk workflows. We conduct a large-scale empirical evaluation across four risk datasets and a realistic fraud-detection environment involving professional analysts and 3,735 case reviews. Our results reveal a fundamental misalignment: standard quantitative metrics, such as sparsity and faithfulness, are decoupled from human-perceived clarity and decision utility. Furthermore, while no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.