Introducing Answered with Evidence -- a framework for evaluating whether LLM responses to biomedical questions are founded in evidence

Julian D Baldwin; Christina Dinh; Arjun Mukerji; Neil Sanghavi; Saurabh Gombar

arXiv:2507.02975·cs.LG·July 8, 2025

Introducing Answered with Evidence -- a framework for evaluating whether LLM responses to biomedical questions are founded in evidence

Julian D Baldwin, Christina Dinh, Arjun Mukerji, Neil Sanghavi, Saurabh Gombar

PDF

TL;DR

This paper introduces a framework called Answered with Evidence to evaluate if LLM responses to biomedical questions are supported by scientific literature, combining multiple retrieval systems to improve evidence-based accuracy.

Contribution

It presents a novel evaluation framework and compares retrieval-augmented systems, demonstrating improved evidence support for biomedical question answering.

Findings

01

PubMed-based systems supported 44% of answers

02

The novel evidence source supported 50% of answers

03

Combined systems provided over 70% reliable answers

Abstract

The growing use of large language models (LLMs) for biomedical question answering raises concerns about the accuracy and evidentiary support of their responses. To address this, we present Answered with Evidence, a framework for evaluating whether LLM-generated answers are grounded in scientific literature. We analyzed thousands of physician-submitted questions using a comparative pipeline that included: (1) Alexandria, fka the Atropos Evidence Library, a retrieval-augmented generation (RAG) system based on novel observational studies, and (2) two PubMed-based retrieval-augmented systems (System and Perplexity). We found that PubMed-based systems provided evidence-supported answers for approximately 44% of questions, while the novel evidence source did so for about 50%. Combined, these sources enabled reliable answers to over 70% of biomedical queries. As LLMs become increasingly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.