ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases
Yuchong Li, Xiaojun Zeng, Chihua Fang, Jian Yang, Fucang Jia, Lei Zhang

TL;DR
This paper introduces ClinBench-HPB, a comprehensive benchmark for evaluating large language models on hepato-pancreato-biliary diseases, highlighting current models' limitations in complex clinical diagnosis tasks.
Contribution
The paper establishes a new, extensive HPB-focused evaluation benchmark with over 3,500 questions and 337 clinical cases, addressing gaps in existing medical LLM assessments.
Findings
LLMs perform well on exam questions but poorly on complex clinical cases
Current LLMs show limited generalizability to HPB diseases
Significant performance gaps highlight need for specialized medical LLMs
Abstract
Hepato-pancreato-biliary (HPB) disorders represent a global public health challenge due to their high morbidity and mortality. Although large language models (LLMs) have shown promising performance in general medical question-answering tasks, the current evaluation benchmarks are mostly derived from standardized examinations or manually designed questions, lacking HPB coverage and clinical cases. To address these issues, we systematically eatablish an HPB disease evaluation benchmark comprising 3,535 closed-ended multiple-choice questions and 337 open-ended real diagnosis cases, which encompasses all the 33 main categories and 465 subcategories of HPB diseases defined in the International Statistical Classification of Diseases, 10th Revision (ICD-10). The multiple-choice questions are curated from public datasets and synthesized data, and the clinical cases are collected from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCholangiocarcinoma and Gallbladder Cancer Studies · Pancreatic and Hepatic Oncology Research · Genomics and Rare Diseases
