RAGCare-QA: A benchmark dataset for evaluating retrieval-augmented generation pipelines in theoretical medical knowledge

Jovana Dobreva; Ivana Karasmanakis; Filip Ivanisevic; Tadej Horvat; Dimitar Kitanovski; Matjaz Gams; Kostadin Mishev; Monika Simjanoska Misheva

PMC · DOI:10.1016/j.dib.2025.112146·October 9, 2025

RAGCare-QA: A benchmark dataset for evaluating retrieval-augmented generation pipelines in theoretical medical knowledge

Jovana Dobreva, Ivana Karasmanakis, Filip Ivanisevic, Tadej Horvat, Dimitar Kitanovski, Matjaz Gams, Kostadin Mishev, Monika Simjanoska Misheva

PDF

Open Access

TL;DR

This paper introduces RAGCare-QA, a dataset of medical questions to evaluate retrieval-augmented generation systems in medical education.

Contribution

The novel contribution is a benchmark dataset for assessing RAG pipelines in theoretical medical knowledge across six specialties.

Findings

01

RAGCare-QA includes 420 questions across six medical specialties with three complexity levels.

02

The dataset categorizes questions by RAG implementation complexity (Basic, Multi-vector, Graph-enhanced).

03

It emphasizes theoretical medical knowledge for education and evaluation of RAG-based systems.

Abstract

The paper introduces RAGCare-QA, an extensive dataset of 420 theoretical medical knowledge questions for assessing Retrieval-Augmented Generation (RAG) pipelines in medical education and evaluation settings. The dataset includes one-choice-only questions from six medical specialties (Cardiology, Endocrinology, Gastroenterology, Family Medicine, Oncology, and Neurology) with three levels of complexity (Basic, Intermediate, and Advanced). Each question is accompanied by the best fit of RAG implementation complexity level, such as Basic RAG (315 questions, 75.0 %), Multi-vector RAG (82 questions, 19.5 %), and Graph-enhanced RAG (23 questions, 5.5 %). The questions emphasize theoretical medical knowledge on fundamental concepts, pathophysiology, diagnostic criteria, and treatment principles important in medical education. The dataset is a useful tool for the assessment of RAG- based medical…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures3

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare