Visualizing token importance for black-box language models
Paulius Rauba, Qiyao Wei, Mihaela van der Schaar

TL;DR
This paper introduces Distribution-Based Sensitivity Analysis (DBSA), a practical, model-agnostic method for visualizing and understanding how individual input tokens influence black-box large language model outputs, aiding in auditing and interpretability.
Contribution
The paper presents DBSA, a novel, lightweight technique for assessing token-level sensitivity in black-box LLMs without distributional assumptions, enhancing interpretability tools for practitioners.
Findings
DBSA effectively visualizes token importance in LLM outputs.
It enables quick, plug-and-play sensitivity analysis for black-box models.
Illustrative examples show DBSA can uncover overlooked input sensitivities.
Abstract
We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings, particularly in high-stakes domains such as legal, medical, and regulatory compliance. Existing approaches for LLM auditing often focus on isolated aspects of model behavior, such as detecting specific biases or evaluating fairness. We are interested in a more general question -- can we understand how the outputs of black-box LLMs depend on each input token? There is a critical need to have such tools in real-world applications that rely on inaccessible API endpoints to language models. However, this is a highly non-trivial problem, as LLMs are stochastic functions (i.e. two outputs will be different by chance), while computing prompt-level gradients to approximate input sensitivity is infeasible. To address this, we propose Distribution-Based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Business Process Modeling and Analysis · Adversarial Robustness in Machine Learning
