MedConsultBench: A Full-Cycle, Fine-Grained, Process-Aware Benchmark for Medical Consultation Agents
Chuhan Qiao, Jianghua Huang, Daxing Zhao, Ziding Liu, Yanjun Shen, Bing Cheng, Wei Lin, Kai Wu

TL;DR
MedConsultBench is a comprehensive benchmark for medical consultation agents that evaluates the entire clinical workflow, emphasizing process integrity, safety, and detailed inquiry logic, revealing gaps between knowledge and practical clinical skills.
Contribution
It introduces a fine-grained, process-aware evaluation framework with AIUs and metrics, addressing the limitations of prior coarse benchmarks and capturing real-world clinical reasoning.
Findings
High diagnostic accuracy does not ensure effective information gathering.
Models often struggle with medication safety and follow-up inquiries.
The benchmark reveals gaps between AI knowledge and clinical practice skills.
Abstract
Current evaluations of medical consultation agents often prioritize outcome-oriented tasks, frequently overlooking the end-to-end process integrity and clinical safety essential for real-world practice. While recent interactive benchmarks have introduced dynamic scenarios, they often remain fragmented and coarse-grained, failing to capture the structured inquiry logic and diagnostic rigor required in professional consultations. To bridge this gap, we propose MedConsultBench, a comprehensive framework designed to evaluate the complete online consultation cycle by covering the entire clinical workflow from history taking and diagnosis to treatment planning and follow-up Q\&A. Our methodology introduces Atomic Information Units (AIUs) to track clinical information acquisition at a sub-turn level, enabling precise monitoring of how key facts are elicited through 22 fine-grained metrics. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Electronic Health Records Systems · Topic Modeling
