A Dataset of Medical Questions Paired with Automatically Generated Answers and Evidence-supported References
Deepak Gupta, Davis Bartels, Dina Demner-Fushman

TL;DR
This paper introduces MedAESQA, a dataset for evaluating and improving medical question-answering systems by linking answers to evidence from scientific sources.
Contribution
The novel contribution is a dataset with medical questions and answers linked to supporting scientific evidence for evaluating factual accuracy.
Findings
The dataset includes 40 deidentified medical questions with 30 human and LLM-generated answers each.
Each answer statement is linked to a scientific abstract, with manual judgments on accuracy and relevance.
The dataset supports the development of models that can attribute facts to reliable sources.
Abstract
New Large Language Models (LLM)-based approaches to medical Question Answering show unprecedented improvements in the fluency, grammaticality, and other qualities of the generated answers. However, the systems occasionally produce coherent, topically relevant, and plausible answers that are not based on facts and may be misleading and even harmful. New types of datasets are needed to evaluate the truthfulness of generated answers and develop reliable approaches for detecting answers that are not supported by evidence. The MedAESQA (Medical Attributable and Evidence Supported Question Answering) dataset presented in this work is designed for developing, fine-tuning, and evaluating language generation models for their ability to attribute or support the stated facts by linking the statements to the relevant passages of reliable sources. The dataset comprises 40 naturally occurring…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
