PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy

TL;DR
This paper introduces PaperSearchQA, a large biomedical paper dataset and benchmark for training search agents that reason over scientific literature using reinforcement learning, demonstrating improved retrieval and reasoning capabilities.
Contribution
It provides a new biomedical paper corpus, a challenging QA dataset, and benchmarks for training and evaluating RL-based scientific paper search agents.
Findings
Agents outperform non-RL baselines in retrieval tasks
Agents exhibit planning, reasoning, and self-verification behaviors
Scalable data creation methods extendable to other scientific domains
Abstract
Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers -- this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems
