OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh,, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D'arcy, David, Wadden, Matt Latzke, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu,, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig

TL;DR
OpenScholar is a retrieval-augmented language model designed to synthesize scientific literature accurately, outperforming existing models in correctness and citation accuracy across multiple scientific domains, with human experts preferring its responses.
Contribution
We introduce OpenScholar, a specialized retrieval-augmented LM for scientific literature synthesis, and develop ScholarQABench, a large-scale benchmark for evaluating literature search and synthesis.
Findings
OpenScholar-8B outperforms GPT-4o by 5% in correctness.
OpenScholar achieves citation accuracy comparable to human experts.
OpenScholar improves GPT-4o correctness by 12%.
Abstract
Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
