CP-Env: Evaluating Large Language Models on Clinical Pathways in a Controllable Hospital Environment
Yakun Zhu, Zhongzhen Huang, Qianhan Feng, Linjie Mu, Yannian Gu, Shaoting Zhang, Qi Dou, Xiaofan Zhang

TL;DR
CP-Env is a new benchmark environment for evaluating large language models on complex, dynamic clinical pathways in hospital settings, highlighting their strengths and weaknesses in realistic medical scenarios.
Contribution
Introduces CP-Env, a controllable hospital environment for comprehensive evaluation of LLMs on end-to-end clinical pathways, including new evaluation frameworks and tools.
Findings
Most models struggle with pathway complexity and hallucinations.
Excessive reasoning steps can be counterproductive.
Top models show reduced tool dependency through internalized knowledge.
Abstract
Medical care follows complex clinical pathways that extend beyond isolated physician-patient encounters, emphasizing decision-making and transitions between different stages. Current benchmarks focusing on static exams or isolated dialogues inadequately evaluate large language models (LLMs) in dynamic clinical scenarios. We introduce CP-Env, a controllable agentic hospital environment designed to evaluate LLMs across end-to-end clinical pathways. CP-Env simulates a hospital ecosystem with patient and physician agents, constructing scenarios ranging from triage and specialist consultation to diagnostic testing and multidisciplinary team meetings for agent interaction. Following real hospital adaptive flow of healthcare, it enables branching, long-horizon task execution. We propose a three-tiered evaluation framework encompassing Clinical Efficacy, Process Competency, and Professional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills
