MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering   Medical Knowledge

Yuxuan Zhou; Xien Liu; Chen Ning; Ji Wu

arXiv:2406.02919·cs.CL·June 6, 2024

MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge

Yuxuan Zhou, Xien Liu, Chen Ning, Ji Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MultifacetEval, a comprehensive evaluation framework that reveals current large language models lack sufficient depth, precision, and coverage in mastering medical knowledge, highlighting their limited readiness for real-world medical applications.

Contribution

The paper develops a novel multifaceted evaluation framework and datasets to systematically assess LLMs' mastery of medical knowledge across multiple dimensions.

Findings

01

LLMs perform significantly worse on multifaceted medical questions than on standard benchmarks.

02

Current LLMs lack depth, precision, and coverage in medical knowledge.

03

LLMs are not yet suitable for real-world medical tasks.

Abstract

Large language models (LLMs) have excelled across domains, also delivering notable performance on the medical evaluation benchmarks, such as MedQA. However, there still exists a significant gap between the reported performance and the practical effectiveness in real-world medical scenarios. In this paper, we aim to explore the causes of this gap by employing a multifaceted examination schema to systematically probe the actual mastery of medical knowledge by current LLMs. Specifically, we develop a novel evaluation framework MultifacetEval to examine the degree and coverage of LLMs in encoding and mastering medical knowledge at multiple facets (comparison, rectification, discrimination, and verification) concurrently. Based on the MultifacetEval framework, we construct two multifaceted evaluation datasets: MultiDiseK (by producing questions from a clinical disease knowledge base) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thumlp/multifaceteval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWikis in Education and Collaboration · Biomedical Text Mining and Ontologies · Genetics, Bioinformatics, and Biomedical Research