A Dataset of Medical Questions Paired with Automatically Generated Answers and Evidence-supported References

Deepak Gupta; Davis Bartels; Dina Demner-Fushman

PMC · DOI:10.1038/s41597-025-05233-z·June 19, 2025

A Dataset of Medical Questions Paired with Automatically Generated Answers and Evidence-supported References

Deepak Gupta, Davis Bartels, Dina Demner-Fushman

PDF

Open Access

TL;DR

This paper introduces MedAESQA, a dataset for evaluating and improving medical question-answering systems by linking answers to evidence from scientific sources.

Contribution

The novel contribution is a dataset with medical questions and answers linked to supporting scientific evidence for evaluating factual accuracy.

Findings

01

The dataset includes 40 deidentified medical questions with 30 human and LLM-generated answers each.

02

Each answer statement is linked to a scientific abstract, with manual judgments on accuracy and relevance.

03

The dataset supports the development of models that can attribute facts to reliable sources.

Abstract

New Large Language Models (LLM)-based approaches to medical Question Answering show unprecedented improvements in the fluency, grammaticality, and other qualities of the generated answers. However, the systems occasionally produce coherent, topically relevant, and plausible answers that are not based on facts and may be misleading and even harmful. New types of datasets are needed to evaluate the truthfulness of generated answers and develop reliable approaches for detecting answers that are not supported by evidence. The MedAESQA (Medical Attributable and Evidence Supported Question Answering) dataset presented in this work is designed for developing, fine-tuning, and evaluating language generation models for their ability to attribute or support the stated facts by linking the statements to the relevant passages of reliable sources. The dataset comprises 40 naturally occurring…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures3

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies