SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?
Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernandez Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang

TL;DR
SciPredict introduces a benchmark to evaluate LLMs' ability to predict scientific experiment outcomes, revealing current models' limitations and emphasizing the need for reliable prediction awareness.
Contribution
The paper presents SciPredict, a new benchmark with 405 tasks across physics, biology, and chemistry, to assess LLMs' predictive accuracy and reliability in scientific experiments.
Findings
Model accuracies are 14-26%, close to human experts' 20%.
Models fail to reliably distinguish between reliable and unreliable predictions.
Human experts' accuracy improves from 5% to 80% as they judge outcomes as predictable.
Abstract
Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes before committing resources to costly physical validation. While existing benchmarks evaluate LLMs on scientific knowledge and reasoning, their ability to predict experimental outcomes - a task where AI could significantly exceed human capabilities - remains largely underexplored. We introduce SciPredict, a benchmark comprising 405 tasks derived from recent empirical studies in 33 specialized sub-fields of physics, biology, and chemistry. SciPredict addresses two critical questions: (a) can LLMs predict the outcome of scientific experiments with sufficient accuracy? and (b) can such predictions be reliably used in the scientific research process? Evaluations reveal fundamental limitations on both fronts. Model accuracies are 14-26% and human expert performance is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
