ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan, Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu, Xuandong, Zhao

TL;DR
ClinicalLab introduces a comprehensive benchmark and evaluation suite for medical agents, addressing limitations of existing benchmarks by covering multi-departmental cases, real scenarios, and new metrics, to improve LLMs in clinical diagnostics.
Contribution
The paper presents ClinicalLab, a novel multi-departmental clinical diagnostic benchmark and a new alignment method for medical agents, enhancing evaluation realism and addressing prior limitations.
Findings
Performance of 17 LLMs varies across departments.
ClinicalAgent improves alignment with real-world clinical practices.
New metrics effectively evaluate clinical diagnostic effectiveness.
Abstract
LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Healthcare · Electronic Health Records Systems · Biomedical Text Mining and Ontologies
MethodsALIGN
