TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley   Value Estimation

Roni Goldshmidt; Miriam Horovicz

arXiv:2407.10114·cs.CL·July 23, 2024

TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation

Roni Goldshmidt, Miriam Horovicz

PDF

Open Access 1 Repo 1 Video

TL;DR

TokenSHAP introduces a Monte Carlo-based Shapley value method for interpreting large language models by quantifying token importance, improving transparency, and aiding prompt engineering.

Contribution

It adapts Shapley values with Monte Carlo sampling for efficient, nuanced interpretation of LLMs, advancing explainability in NLP.

Findings

01

Outperforms existing baselines in alignment with human judgments

02

Provides consistent, faithful token importance measures

03

Enhances understanding of token interactions in LLMs

Abstract

As large language models (LLMs) become increasingly prevalent in critical applications, the need for interpretable AI has grown. We introduce TokenSHAP, a novel method for interpreting LLMs by attributing importance to individual tokens or substrings within input prompts. This approach adapts Shapley values from cooperative game theory to natural language processing, offering a rigorous framework for understanding how different parts of an input contribute to a model's response. TokenSHAP leverages Monte Carlo sampling for computational efficiency, providing interpretable, quantitative measures of token importance. We demonstrate its efficacy across diverse prompts and LLM architectures, showing consistent improvements over existing baselines in alignment with human judgments, faithfulness to model behavior, and consistency. Our method's ability to capture nuanced interactions between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ronigold/TokenSHAP
pytorchOfficial

Videos

TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques