Explaining Large Language Models Decisions Using Shapley Values
Behnam Mohammadi

TL;DR
This paper introduces a Shapley value-based method to interpret large language model decisions, revealing how prompt components influence outputs and exposing token noise effects that challenge the use of LLMs as human behavior proxies.
Contribution
It presents a novel, model-agnostic approach using Shapley values to analyze prompt influence and identify biases in LLM decision-making processes.
Findings
Shapley values help quantify prompt component contributions.
Token noise effects can disproportionately influence LLM decisions.
Caution is needed when using LLMs as human behavior substitutes.
Abstract
The emergence of large language models (LLMs) has opened up exciting possibilities for simulating human behavior and cognitive processes, with potential applications in various domains, including marketing research and consumer behavior analysis. However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain due to glaring divergences that suggest fundamentally different underlying processes at play and the sensitivity of LLM responses to prompt variations. This paper presents a novel approach based on Shapley values from cooperative game theory to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output. Through two applications - a discrete choice experiment and an investigation of cognitive biases - we demonstrate how the Shapley value method can uncover what we term "token noise" effects, a phenomenon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
