Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals
Michael Todasco (Visiting Fellow at the James Silberrad Center for Artificial Intelligence, San Diego State University)

TL;DR
This study explores whether framing LLM evaluations as prediction markets with virtual currency improves forecasting accuracy and reveals confidence signals, showing modest accuracy gains and clear confidence indicators through stake size.
Contribution
It introduces a novel betting-based evaluation framework for LLMs that surfaces calibrated confidence signals and demonstrates potential for risk-aware forecasting.
Findings
Incentive condition showed modest accuracy improvement (81.5% vs. 79.1%).
Large bets correlated with higher correctness (~99%).
Betting created visible confidence signals absent in binary predictions.
Abstract
Large language models are increasingly used to evaluate other models, yet these judgments typically lack any representation of confidence. This pilot study tests whether framing an evaluation task as a betting game (a fictional prediction market with its own LLM currency) improves forecasting accuracy and surfaces calibrated confidence signals. We generated 100 math and logic questions with verifiable answers. Six Baseline models (three current-generation, three prior-generation) answered all items. Three Predictor models then forecasted, for each question-baseline pair, if the baseline would answer correctly. Each predictor completed matched runs in two conditions: Control (simple correct/incorrect predictions) and Incentive (predictions plus wagers of 1-100,000 LLMCoin under even odds, starting from a 1,000,000 LLMCoin bankroll). Across 5,400 predictions per condition, Incentive runs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Sports Analytics and Performance
