Introducing Answered with Evidence -- a framework for evaluating whether LLM responses to biomedical questions are founded in evidence
Julian D Baldwin, Christina Dinh, Arjun Mukerji, Neil Sanghavi, Saurabh Gombar

TL;DR
This paper introduces a framework called Answered with Evidence to evaluate if LLM responses to biomedical questions are supported by scientific literature, combining multiple retrieval systems to improve evidence-based accuracy.
Contribution
It presents a novel evaluation framework and compares retrieval-augmented systems, demonstrating improved evidence support for biomedical question answering.
Findings
PubMed-based systems supported 44% of answers
The novel evidence source supported 50% of answers
Combined systems provided over 70% reliable answers
Abstract
The growing use of large language models (LLMs) for biomedical question answering raises concerns about the accuracy and evidentiary support of their responses. To address this, we present Answered with Evidence, a framework for evaluating whether LLM-generated answers are grounded in scientific literature. We analyzed thousands of physician-submitted questions using a comparative pipeline that included: (1) Alexandria, fka the Atropos Evidence Library, a retrieval-augmented generation (RAG) system based on novel observational studies, and (2) two PubMed-based retrieval-augmented systems (System and Perplexity). We found that PubMed-based systems provided evidence-supported answers for approximately 44% of questions, while the novel evidence source did so for about 50%. Combined, these sources enabled reliable answers to over 70% of biomedical queries. As LLMs become increasingly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
