MedREQAL: Examining Medical Knowledge Recall of Large Language Models   via Question Answering

Juraj Vladika; Phillip Schneider; Florian Matthes

arXiv:2406.05845·cs.CL·June 11, 2024·1 cites

MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

Juraj Vladika, Phillip Schneider, Florian Matthes

PDF

Open Access

TL;DR

This paper evaluates the medical knowledge recall ability of large language models using a novel dataset derived from systematic reviews, highlighting the challenges in biomedical question answering.

Contribution

Introduces MedREQAL, a new dataset from systematic reviews, to assess LLMs' medical knowledge recall in question answering tasks.

Findings

01

LLMs show promising but limited performance on biomedical QA.

02

The task remains challenging for current LLMs in medical knowledge recall.

03

Systematic review-based datasets can effectively evaluate medical knowledge in LLMs.

Abstract

In recent years, Large Language Models (LLMs) have demonstrated an impressive ability to encode knowledge during pre-training on large text corpora. They can leverage this knowledge for downstream tasks like question answering (QA), even in complex areas involving health topics. Considering their high potential for facilitating clinical work in the future, understanding the quality of encoded medical knowledge and its recall in LLMs is an important step forward. In this study, we examine the capability of LLMs to exhibit medical knowledge recall by constructing a novel dataset derived from systematic reviews -- studies synthesizing evidence-based answers for specific medical questions. Through experiments on the new MedREQAL dataset, comprising question-answer pairs extracted from rigorous systematic reviews, we assess six LLMs, such as GPT and Mixtral, analyzing their classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Attention Dropout · Linear Layer · Multi-Head Attention · Dropout · Dense Connections · Cosine Annealing