A Regression Framework for Understanding Prompt Component Impact on LLM Performance
Andrew Lauziere, Jonathan Daugherty, Taisa Kushner

TL;DR
This paper introduces a statistical regression framework to analyze how different prompt components affect large language model performance, providing insights into prompt design and misinformation impact.
Contribution
The authors extend XAI methods to LLMs by developing a regression-based approach that quantifies the influence of prompt features on model outcomes.
Findings
Mistral-7B and GPT-OSS-20B explain over 70% of performance variation with prompt features.
Incorrect example pairs hinder model problem-solving abilities.
Positive and negative instructions have contradictory effects on performance.
Abstract
As large language models (LLMs) continue to improve and see further integration into software systems, so does the need to understand the conditions in which they will perform. We contribute a statistical framework for understanding the impact of specific prompt features on LLM performance. The approach extends previous explainable artificial intelligence (XAI) methods specifically to inspect LLMs by fitting regression models relating portions of the prompt to LLM evaluation. We apply our method to compare how two open-source models, Mistral-7B and GPT-OSS-20B, leverage the prompt to perform a simple arithmetic problem. Regression models of individual prompt portions explain 72% and 77% of variation in model performances, respectively. We find misinformation in the form of incorrect example query-answer pairs impedes both models from solving the arithmetic query, though positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
