Automatic Interactive Evaluation for Large Language Models with State   Aware Patient Simulator

Yusheng Liao; Yutong Meng; Yuhao Wang; Hongcheng Liu; Yanfeng Wang; Yu; Wang

arXiv:2403.08495·cs.CL·July 23, 2024·2 cites

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, Yu, Wang

PDF

Open Access 4 Repos

TL;DR

This paper presents the AIE framework and SAPS simulator to dynamically evaluate large language models in realistic clinical scenarios, improving assessment accuracy for medical applications.

Contribution

Introduction of the AIE framework and SAPS simulator for dynamic, realistic evaluation of medical LLMs in clinical-like multi-turn interactions.

Findings

01

AIE aligns well with human evaluations.

02

SAPS provides realistic doctor-patient simulations.

03

Enhanced assessment of LLMs in healthcare contexts.

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored. Previous works mainly focus on the performance of medical knowledge with examinations, which is far from the realistic scenarios, falling short in assessing the abilities of LLMs on clinical tasks. In the quest to enhance the application of Large Language Models (LLMs) in healthcare, this paper introduces the Automated Interactive Evaluation (AIE) framework and the State-Aware Patient Simulator (SAPS), targeting the gap between traditional LLM evaluations and the nuanced demands of clinical practice. Unlike prior methods that rely on static medical knowledge assessments, AIE and SAPS provide a dynamic, realistic platform for assessing LLMs through multi-turn doctor-patient simulations. This approach offers a closer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging

MethodsFocus · ALIGN