FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks
Miles Wang, Robi Lin, Kat Hu, Joy Jiao, Neil Chowdhury, Ethan Chang, Tejal Patwardhan

TL;DR
FrontierScience is a comprehensive benchmark designed to evaluate AI models on expert-level scientific reasoning tasks across physics, chemistry, and biology, using Olympiad and research problem tracks.
Contribution
It introduces a novel, multi-faceted evaluation framework with real-world, PhD-level scientific problems and a detailed rubric-based assessment method.
Findings
Models show significant progress but still struggle with complex scientific reasoning.
The benchmark reveals gaps in AI's ability to perform at expert scientific levels.
It provides a new standard for measuring scientific reasoning in AI.
Abstract
We introduce FrontierScience, a benchmark evaluating expert-level scientific reasoning in frontier language models. Recent model progress has nearly saturated existing science benchmarks, which often rely on multiple-choice knowledge questions or already published information. FrontierScience addresses this gap through two complementary tracks: (1) Olympiad, consisting of international olympiad problems at the level of IPhO, IChO, and IBO, and (2) Research, consisting of PhD-level, open-ended problems representative of sub-tasks in scientific research. FrontierScience contains several hundred questions (including 160 in the open-sourced gold set) covering subfields across physics, chemistry, and biology, from quantum electrodynamics to synthetic organic chemistry. All Olympiad problems are originally produced by international Olympiad medalists and national team coaches to ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Artificial Intelligence in Healthcare and Education · Scientific Computing and Data Management
