Improving Data and Reward Design for Scientific Reasoning in Large Language Models

Zijie Chen; Zhenghao Lin; Xiao Liu; Zhenzhong Lan; Yeyun Gong; Peng Cheng

arXiv:2602.08321·cs.CL·February 11, 2026

Improving Data and Reward Design for Scientific Reasoning in Large Language Models

Zijie Chen, Zhenghao Lin, Xiao Liu, Zhenzhong Lan, Yeyun Gong, Peng Cheng

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a comprehensive dataset and a novel training pipeline for scientific reasoning in large language models, significantly improving their ability to handle open-ended science questions.

Contribution

It presents the Dr. SCI dataset and a new post-training approach with three key components to enhance scientific reasoning capabilities.

Findings

01

Qwen3-4B-Base achieves 63.2 on GPQA-diamond

02

Model outperforms strong baselines like o1-mini and GPT-4o

03

Demonstrates substantial gains in open-ended scientific reasoning

Abstract

Solving open-ended science questions remains challenging for large language models, particularly due to inherently unreliable supervision and evaluation. The bottleneck lies in the data construction and reward design for scientific post-training. We develop a large-scale, systematic data processing pipeline that transforms heterogeneous open-source science data into Dr. SCI dataset, which comprises of 1M questions across eight STEM subjects, with explicit verifiable/open-ended splits, scalable difficulty annotation, and fine-grained rubrics that operationalize evaluation for open-ended answers. Building on this dataset, we propose the Dr. SCI post-training pipeline, which redesigns the standard SFT -> RL workflow through three components: (i) Exploration-Expanding SFT, which broadens the model's reasoning pattern coverage prior to RL; (ii) Dynamic Difficulty Curriculum, which adapts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MiniByte-666/Dr.SCI
dataset· 137 dl
137 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Multimodal Machine Learning Applications