llmSHAP: A Principled Approach to LLM Explainability
Filip Naudot, Tobias Sundqvist, Timotheus Kampik

TL;DR
This paper investigates the application of Shapley value-based feature attribution to large language models, analyzing the impact of their stochastic inference on explainability guarantees and trade-offs involved.
Contribution
It provides a principled analysis of when Shapley value principles hold in stochastic LLMs and explores the trade-offs between explainability, speed, and accuracy.
Findings
Shapley value principles may not always be guaranteed in stochastic LLMs.
Trade-offs exist between inference speed, attribution accuracy, and principle satisfaction.
Different implementation variants affect the reliability of Shapley-based explanations.
Abstract
Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles, assuming deterministic inference. We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, stochastic (non-deterministic). We then demonstrate when we can and cannot guarantee Shapley value principle satisfaction across different implementation variants applied to LLM-based decision support, and analyze how the stochastic nature of LLMs affects these guarantees. We also highlight trade-offs between explainable inference speed, agreement with exact Shapley value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning
