Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

Srujan P Mule; Aniketh Garikaparthi; Manasi Patwardhan

arXiv:2605.21491·cs.LG·May 22, 2026

Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

Srujan P Mule, Aniketh Garikaparthi, Manasi Patwardhan

PDF

TL;DR

This paper explores training language models to predict the success of research ideas before experiments, using a large dataset and novel reasoning methods, to improve scientific discovery efficiency.

Contribution

It introduces a dataset for empirical forecasting of research ideas and demonstrates that small language models can effectively predict research success with interpretability.

Findings

01

SFT improves accuracy from 30% to 77.1%.

02

Reinforcement Learning with Verifiable Rewards achieves 71.35% accuracy.

03

Models transfer well across domains and time splits.

Abstract

As language models accelerate scientific research by automating hypothesis generation and implementation, a new bottleneck emerges: evaluating and filtering hundreds of AI-generated ideas without exhaustive experimentation. We ask whether LMs can learn to forecast the empirical success of research ideas before any experiments are run. We study comparative empirical forecasting: given a benchmark-specific research goal and two candidate ideas, predict which will achieve better benchmark performance. We construct a dataset of 11,488 idea pairs grounded in objective outcomes from PapersWithCode. While off-the-shelf 8B-parameter models struggle (30% acc.), SFT dramatically boosts performance to 77.1%, outperforming GPT-5 (61.1%). By framing evaluation as a reasoning task via Reinforcement Learning with Verifiable Rewards (RLVR), we train models to discover latent reasoning paths, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.