Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning
Ming Yin, Yuanhao Qu, Ling Yang, Le Cong, Mengdi Wang

TL;DR
This paper presents a novel approach to train large language models in scientific reasoning within genomics by leveraging expert discussions and reinforcement learning, resulting in significant performance improvements.
Contribution
It introduces an automated pipeline and Genome-Bench benchmark for teaching LLMs scientific reasoning from expert discussions, a first in the field.
Findings
Reinforcement learning improves LLM performance by over 15% on Genome-Bench.
The pipeline effectively transforms scientific discussions into training data.
The approach shows potential for generalization across scientific domains.
Abstract
We investigate how to teach large language models (LLMs) to perform scientific reasoning by leveraging expert discussions as a learning signal. Focusing on the genomics domain, we develop an automated pipeline to extract trainable data and introduce Genome-Bench, a new benchmark constructed from over a decade of scientific forum discussions on genome engineering. Our pipeline transforms raw interactions into a reinforcement learning-friendly multiple-choice questions format, supported by 3000+ high-quality question-answer pairs spanning foundational biology, experimental troubleshooting, tool usage, and beyond. We fine-tune an LLM using RL with a rule-based reward signal derived from the synthetic MCQ dataset to enhance domain-specific reasoning. Our results show that reinforcement learning from scientific discussions improves model performance by over 15% compared to the base model on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
