Probabilistic Attribution For Large Language Models
Shilpika Shilpika, Carlo Graziani, Bethany Lusch, Venkatram Vishwanath, Michael E. Papka

TL;DR
This paper introduces a probabilistic token attribution method for Large Language Models based on stochastic process theory, enhancing interpretability by analyzing token contributions and model behavior.
Contribution
It presents a model-agnostic attribution measure using Bayes rule, providing insights into LLM internal representations and response uncertainties.
Findings
The attribution score highlights uncertain or unstable response parts.
Entropy analysis reveals model behavior and response stability.
Evaluation across models and prompts uncovers anomalies and sensitivities.
Abstract
The generative nature of Large Language Models (LLMs) is reflected in the conditional probabilities they compute to sample each response token given the previous tokens. These probabilities encode the distributional structure that the model learns in training and exploits in inference. In this work, we use these probabilities to situate LLMs within the mathematical theory of stochastic processes. We use this framework to design a model-agnostic probabilistic token attribution measure, using Bayes rule to invert the next-token log-probabilities so as to capture the models internal representation of the distribution over token sequences. The representation is independent of the models computational structure. This representation yields the conditional probability of the response given the prompt, and of the response given the prompt with a token marginalized away. Our attribution score is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
