Outcome-based Reinforcement Learning to Predict the Future

Benjamin Turtel; Danny Franklin; Kris Skotheim; Luke Hewitt; Philipp Schoenegger

arXiv:2505.17989·cs.LG·December 2, 2025

Outcome-based Reinforcement Learning to Predict the Future

Benjamin Turtel, Danny Franklin, Kris Skotheim, Luke Hewitt, Philipp Schoenegger

PDF

3 Models 1 Datasets

TL;DR

This paper demonstrates that reinforcement learning with verifiable rewards can be effectively applied to forecast future real-world events, achieving high accuracy and practical investment returns with a compact model.

Contribution

The authors introduce a novel application of RLVR to real-world event prediction, utilizing a new dataset and techniques to enhance model accuracy and calibration.

Findings

01

A 14B model matches or surpasses frontier models in prediction accuracy.

02

The model's bets would have yielded over 10% ROI in a trading simulation.

03

Enhanced training methods improve stability and calibration.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

LightningRodLabs/WWTD-2025
dataset· 311 dl
311 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training