ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical   Prediction?

Canyu Chen; Jian Yu; Shan Chen; Che Liu; Zhongwei Wan; Danielle; Bitterman; Fei Wang; Kai Shu

arXiv:2411.06469·cs.CL·November 12, 2024·3 cites

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Danielle, Bitterman, Fei Wang, Kai Shu

PDF

Open Access

TL;DR

This study introduces ClinicalBench, a comprehensive benchmark comparing LLMs and traditional ML models in clinical prediction, revealing that current LLMs do not outperform traditional models in this domain.

Contribution

The paper presents ClinicalBench, a new benchmark for evaluating clinical predictive models, and provides extensive empirical analysis showing LLMs' current limitations compared to traditional ML methods.

Findings

01

LLMs do not outperform traditional ML models in clinical prediction tasks.

02

Traditional ML models remain more effective for clinical decision-making.

03

LLMs show potential but need further development for clinical applications.

Abstract

Large Language Models (LLMs) hold great promise to revolutionize current clinical systems for their superior capacities on medical text processing tasks and medical licensing exams. Meanwhile, traditional ML models such as SVM and XGBoost have still been mainly adopted in clinical prediction tasks. An emerging question is Can LLMs beat traditional ML models in clinical prediction? Thus, we build a new benchmark ClinicalBench to comprehensively study the clinical predictive modeling capacities of both general-purpose and medical LLMs, and compare them with traditional ML models. ClinicalBench embraces three common clinical prediction tasks, two databases, 14 general-purpose LLMs, 8 medical LLMs, and 11 traditional ML models. Through extensive empirical investigation, we discover that both general-purpose and medical LLMs, even with different model scales, diverse prompting or fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Natural Language Processing Techniques

MethodsSupport Vector Machine · ADaptive gradient method with the OPTimal convergence rate