Outcome-based Reinforcement Learning to Predict the Future
Benjamin Turtel, Danny Franklin, Kris Skotheim, Luke Hewitt, Philipp Schoenegger

TL;DR
This paper demonstrates that reinforcement learning with verifiable rewards can be effectively applied to forecast future real-world events, achieving high accuracy and practical investment returns with a compact model.
Contribution
The authors introduce a novel application of RLVR to real-world event prediction, utilizing a new dataset and techniques to enhance model accuracy and calibration.
Findings
A 14B model matches or surpasses frontier models in prediction accuracy.
The model's bets would have yielded over 10% ROI in a trading simulation.
Enhanced training methods improve stability and calibration.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
