Language agents achieve superhuman synthesis of scientific knowledge
Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza,, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G., Rodriques, and Andrew D. White

TL;DR
This paper introduces PaperQA2, a language model agent that outperforms human experts in scientific literature tasks such as retrieval, summarization, and contradiction detection, demonstrating significant advancements in scientific knowledge synthesis.
Contribution
The paper presents a new language model agent, PaperQA2, optimized for factual accuracy and a benchmark LitQA2, achieving superhuman performance in scientific literature research tasks.
Findings
PaperQA2 matches or exceeds expert performance on literature search tasks.
PaperQA2 produces more accurate scientific summaries than Wikipedia articles.
PaperQA2 identifies contradictions in scientific papers with 70% validation by humans.
Abstract
Language models are known to hallucinate incorrect information, and it is unclear if they are sufficiently accurate and reliable for use in scientific research. We developed a rigorous human-AI comparison methodology to evaluate language model agents on real-world literature search tasks covering information retrieval, summarization, and contradiction detection tasks. We show that PaperQA2, a frontier language model agent optimized for improved factuality, matches or exceeds subject matter expert performance on three realistic literature research tasks without any restrictions on humans (i.e., full access to internet, search tools, and time). PaperQA2 writes cited, Wikipedia-style summaries of scientific topics that are significantly more accurate than existing, human-written Wikipedia articles. We also introduce a hard benchmark for scientific literature research called LitQA2 that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution
