PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

James Burgess; Jan N. Hansen; Duo Peng; Yuhui Zhang; Alejandro Lozano; Min Woo Sun; Emma Lundberg; Serena Yeung-Levy

arXiv:2601.18207·cs.LG·January 27, 2026

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy

PDF

Open Access 4 Datasets 1 Video

TL;DR

This paper introduces PaperSearchQA, a large biomedical paper dataset and benchmark for training search agents that reason over scientific literature using reinforcement learning, demonstrating improved retrieval and reasoning capabilities.

Contribution

It provides a new biomedical paper corpus, a challenging QA dataset, and benchmarks for training and evaluating RL-based scientific paper search agents.

Findings

01

Agents outperform non-RL baselines in retrieval tasks

02

Agents exhibit planning, reasoning, and self-verification behaviors

03

Scalable data creation methods extendable to other scientific domains

Abstract

Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers -- this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems