AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical   Interaction Simulator

Zhihao Fan; Jialong Tang; Wei Chen; Siyuan Wang; Zhongyu Wei; Jun Xi,; Fei Huang; Jingren Zhou

arXiv:2402.09742·cs.CL·July 1, 2024·5 cites

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xi,, Fei Huang, Jingren Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces AI Hospital, a multi-agent simulation framework and benchmark for evaluating large language models in realistic medical interactions, revealing current limitations and guiding future improvements.

Contribution

It presents a novel multi-agent simulation environment and the MVME benchmark for assessing LLMs in complex clinical scenarios with multi-turn interactions.

Findings

01

LLMs perform significantly worse in multi-turn interactions than in one-step tasks.

02

The dispute resolution mechanism improves diagnostic accuracy.

03

Current LLMs still have substantial gaps in clinical diagnostic capabilities.

Abstract

Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LibertFan/AI_Hospital
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education