Prompt Engineering Large Language Models' Forecasting Capabilities
Philipp Schoenegger, Cameron R. Jones, Philip E. Tetlock, and Barbara Mellers

TL;DR
This study evaluates the effectiveness of prompt engineering in improving large language models' forecasting accuracy, finding limited gains from prompt modifications and highlighting the need for more advanced techniques for complex tasks.
Contribution
It provides empirical evidence that simple prompt modifications are insufficient for enhancing forecasting accuracy in large language models, emphasizing the necessity for more robust methods.
Findings
Most prompt modifications yield negligible improvements.
References to base rates slightly improve accuracy.
Encouraging Bayesian reasoning can negatively impact performance.
Abstract
Large language model performance can be improved in a large number of ways. Many such techniques, like fine-tuning or advanced tool usage, are time-intensive and expensive. Although prompt engineering is significantly cheaper and often works for simpler tasks, it remains unclear whether prompt engineering suffices for more complex domains like forecasting. Here we show that small prompt modifications rarely boost forecasting accuracy beyond a minimal baseline. In our first study, we tested 38 prompts across Claude 3.5 Sonnet, Claude 3.5 Haiku, GPT-4o, and Llama 3.1 405B. In our second, we introduced compound prompts and prompts from external sources, also including the reasoning models o1 and o1-mini. Our results show that most prompts lead to negligible gains, although references to base rates yield slight benefits. Surprisingly, some strategies showed strong negative effects on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
