LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets
Aidan Vyas

TL;DR
LemonadeBench v0.5 is a benchmark that evaluates large language models' economic reasoning, planning, and decision-making in a simulated business environment, revealing their strengths and limitations.
Contribution
This work introduces LemonadeBench v0.5, a novel benchmark for assessing economic intuition and strategic decision-making in large language models.
Findings
Models achieve profitability, with performance scaling with model sophistication.
Frontier models capture 70% of the theoretical optimal profit.
Models tend to optimize locally rather than globally, showing specific strengths and blind spots.
Abstract
We introduce LemonadeBench v0.5, a minimal benchmark for evaluating economic intuition, long-term planning, and decision-making under uncertainty in large language models (LLMs) through a simulated lemonade stand business. Models must manage inventory with expiring goods, set prices, choose operating hours, and maximize profit over a 30-day period-tasks that any small business owner faces daily. All models demonstrate meaningful economic agency by achieving profitability, with performance scaling dramatically by sophistication-from basic models earning minimal profits to frontier models capturing 70% of theoretical optimal, a greater than 10x improvement. Yet our decomposition of business efficiency across six dimensions reveals a consistent pattern: models achieve local rather than global optimization, excelling in select areas while exhibiting surprising blind spots elsewhere.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Forecasting Techniques and Applications · Complex Systems and Time Series Analysis
