LLMs Can Teach Themselves to Better Predict the Future

Benjamin Turtel; Danny Franklin; and Philipp Schoenegger

arXiv:2502.05253·cs.CL·February 11, 2025

LLMs Can Teach Themselves to Better Predict the Future

Benjamin Turtel, Danny Franklin, and Philipp Schoenegger

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a self-supervised fine-tuning method for large language models that improves their forecasting accuracy by using model-generated reasoning trajectories and outcome-based ranking, without human-labeled data.

Contribution

The authors develop an outcome-driven self-play framework combined with Direct Preference Optimization to enhance LLM forecasting abilities without human reasoning samples.

Findings

01

Increases prediction accuracy by 7-10% on test sets.

02

Achieves forecasting performance comparable to larger models like GPT-4o.

03

Does not rely on human-curated reasoning data.

Abstract

We present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models (LLMs) without relying on human-curated reasoning samples. Our method leverages model self-play to generate pairs of diverse reasoning trajectories and probabilistic forecasts for a set of diverse questions that resolve after the models' knowledge cutoff date. We then rank pairs of these reasoning traces by their distance to the actual outcomes before fine-tuning the model via Direct Preference Optimization (DPO). On a separate test set, our approach increases prediction accuracy of Phi-4 14B and DeepSeek-R1 14B by between 7--10\% over a base model and a DPO fine-tuned control model with randomized labels, bringing them on par with forecasting capabilities of much larger frontier models like GPT-4o.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaoolee/garss
pytorch

Models

🤗
LightningRodLabs/foresight-32B
model· 133 dl· ♡ 8
133 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsResearch Data Management Practices

MethodsDirect Preference Optimization · Balanced Selection · Sparse Evolutionary Training