DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models
Olivia Kim

TL;DR
This paper introduces the DETAIL framework to evaluate how prompt specificity affects reasoning accuracy in large language models, revealing that more detailed prompts generally improve performance especially for smaller models and procedural tasks.
Contribution
The study systematically quantifies prompt specificity and demonstrates its impact on LLM reasoning, providing new tools and insights for adaptive prompt design.
Findings
Higher prompt specificity improves accuracy in reasoning tasks.
Smaller models benefit more from detailed prompts.
Procedural tasks see significant gains with increased prompt detail.
Abstract
Prompt design plays a critical role in the reasoning performance of large language models (LLMs), yet the impact of prompt specificity - how detailed or vague a prompt is - remains understudied. This paper introduces DETAIL, a framework for evaluating LLM performance across varying levels of prompt specificity. We generate multi-level prompts using GPT-4, quantify specificity via perplexity, and assess correctness using GPT-based semantic equivalence. Experiments on 30 novel reasoning tasks across GPT-4 and O3-mini reveal that specificity improves accuracy, especially for smaller models and procedural tasks. Our results highlight the need for adaptive prompting strategies and provide tools and data to support further research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
