POSIX: A Prompt Sensitivity Index For Large Language Models
Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia,, Tanmoy Chakraborty

TL;DR
This paper introduces POSIX, a new index to measure prompt sensitivity in large language models, revealing how prompt variations affect output and how different factors influence sensitivity.
Contribution
The paper proposes POSIX, a novel prompt sensitivity index, and demonstrates its effectiveness in evaluating and comparing prompt sensitivity across various LLMs.
Findings
Adding few-shot exemplars reduces prompt sensitivity.
Prompt template alterations significantly increase sensitivity in MCQ tasks.
Paraphrasing prompts causes high sensitivity in open-ended tasks.
Abstract
Despite their remarkable capabilities, Large Language Models (LLMs) are found to be surprisingly sensitive to minor variations in prompts, often generating significantly divergent outputs in response to minor variations in the prompts, such as spelling errors, alteration of wording or the prompt template. However, while assessing the quality of an LLM, the focus often tends to be solely on its performance on downstream tasks, while very little to no attention is paid to prompt sensitivity. To fill this gap, we propose POSIX - a novel PrOmpt Sensitivity IndeX as a reliable measure of prompt sensitivity, thereby offering a more comprehensive evaluation of LLM performance. The key idea behind POSIX is to capture the relative change in loglikelihood of a given response upon replacing the corresponding prompt with a different intent-preserving prompt. We provide thorough empirical evidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Focus
