Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals

Michael Todasco (Visiting Fellow at the James Silberrad Center for Artificial Intelligence; San Diego State University)

arXiv:2512.05998·cs.AI·December 9, 2025

Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals

Michael Todasco (Visiting Fellow at the James Silberrad Center for Artificial Intelligence, San Diego State University)

PDF

Open Access

TL;DR

This study explores whether framing LLM evaluations as prediction markets with virtual currency improves forecasting accuracy and reveals confidence signals, showing modest accuracy gains and clear confidence indicators through stake size.

Contribution

It introduces a novel betting-based evaluation framework for LLMs that surfaces calibrated confidence signals and demonstrates potential for risk-aware forecasting.

Findings

01

Incentive condition showed modest accuracy improvement (81.5% vs. 79.1%).

02

Large bets correlated with higher correctness (~99%).

03

Betting created visible confidence signals absent in binary predictions.

Abstract

Large language models are increasingly used to evaluate other models, yet these judgments typically lack any representation of confidence. This pilot study tests whether framing an evaluation task as a betting game (a fictional prediction market with its own LLM currency) improves forecasting accuracy and surfaces calibrated confidence signals. We generated 100 math and logic questions with verifiable answers. Six Baseline models (three current-generation, three prior-generation) answered all items. Three Predictor models then forecasted, for each question-baseline pair, if the baseline would answer correctly. Each predictor completed matched runs in two conditions: Control (simple correct/incorrect predictions) and Incentive (predictions plus wagers of 1-100,000 LLMCoin under even odds, starting from a 1,000,000 LLMCoin bankroll). Across 5,400 predictions per condition, Incentive runs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Sports Analytics and Performance