TL;DR
This paper analyzes why large language models are highly sensitive to prompt wording by examining their internal behavior and deriving bounds on output differences, linking these to prompt sensitivity metrics.
Contribution
It introduces a theoretical framework modeling LLMs as multivariate functions, deriving bounds on prompt sensitivity, and identifying factors influencing this sensitivity, supported by empirical code.
Findings
LLMs disperse similar inputs rather than cluster them, increasing prompt sensitivity.
The derived upper bound correlates strongly with the PromptSensiScore metric.
Prompt templates tend to influence logits more than the questions themselves.
Abstract
Prompt sensitivity, which refers to how strongly the output of a large language model (LLM) depends on the exact wording of its input prompt, raises concerns among users about the LLM's stability and reliability. In this work, we consider LLMs as multivariate functions and perform a first-order Taylor expansion, thereby analyzing the relationship between meaning-preserving prompts, their gradients, and the log probabilities of the model's next token. We derive an upper bound on the difference between log probabilities using the Cauchy-Schwarz inequality. We show that LLMs do not internally cluster similar inputs like smaller neural networks do, but instead disperse them. This dispersing behavior leads to an excessively high upper bound on the difference of log probabilities between two meaning-preserving prompts, making it difficult to effectively reduce to 0. In our analysis, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
