LLM-Mini-CEX: Automatic Evaluation of Large Language Model for   Diagnostic Conversation

Xiaoming Shi; Jie Xu; Jinru Ding; Jiali Pang; Sichen Liu; Shuqing Luo,; Xingwei Peng; Lu Lu; Haihong Yang; Mingtao Hu; Tong Ruan; Shaoting Zhang

arXiv:2308.07635·cs.CL·August 16, 2023·2 cites

LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

Xiaoming Shi, Jie Xu, Jinru Ding, Jiali Pang, Sichen Liu, Shuqing Luo,, Xingwei Peng, Lu Lu, Haihong Yang, Mingtao Hu, Tong Ruan, Shaoting Zhang

PDF

Open Access

TL;DR

This paper introduces LLM-Mini-CEX, an automatic, comprehensive evaluation framework for medical diagnostic LLMs, utilizing a patient simulator and ChatGPT to automate assessment and address labor-intensive evaluation processes.

Contribution

It proposes a novel, unified evaluation criterion for medical LLMs and automates the evaluation process using ChatGPT and a patient simulator, reducing manual effort.

Findings

01

LLM-specific Mini-CEX effectively evaluates diagnostic capabilities.

02

ChatGPT can replace manual evaluation for dialogue quality.

03

Automated evaluation is reproducible and efficient.

Abstract

There is an increasing interest in developing LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluations heavily rely on labor-intensive interactions with LLMs to obtain diagnostic dialogues and human evaluation on the quality of diagnosis dialogue. To tackle the lack of unified and comprehensive evaluation criterion, we first initially establish an evaluation criterion, termed LLM-specific Mini-CEX to assess the diagnostic capabilities of LLMs effectively, based on original Mini-CEX. To address the labor-intensive interaction problem, we develop a patient simulator to engage in automatic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education