Benchmarking LLMs for Predictive Applications in the Intensive Care Units
Chehak Malhotra, Mehak Gopal, Akshaya Devadiga, Pradeep Singh, Ridam Pal, Ritwik Kashyap, Tavpritesh Sethi

TL;DR
This study benchmarks various large language models for predicting shock in ICU patients, revealing that LLMs are not necessarily superior to smaller models for clinical prediction tasks, emphasizing the need for models focused on clinical trajectory prediction.
Contribution
It provides a comparative analysis of LLMs and SLMs in ICU shock prediction, highlighting the limited advantage of LLMs in this domain and suggesting future directions for model development.
Findings
GatorTron Base achieved 80.5% weighted recall.
Performance was similar between LLMs and SLMs.
LLMs are not inherently better for clinical event prediction.
Abstract
With the advent of LLMs, various tasks across the natural language processing domain have been transformed. However, their application in predictive tasks remains less researched. This study compares large language models, including GatorTron-Base (trained on clinical data), Llama 8B, and Mistral 7B, against models like BioBERT, DocBERT, BioClinicalBERT, Word2Vec, and Doc2Vec, setting benchmarks for predicting Shock in critically ill patients. Timely prediction of shock can enable early interventions, thus improving patient outcomes. Text data from 17,294 ICU stays of patients in the MIMIC III database were scored for length of stay > 24 hours and shock index (SI) > 0.7 to yield 355 and 87 patients with normal and abnormal SI-index, respectively. Both focal and cross-entropy losses were used during finetuning to address class imbalances. Our findings indicate that while GatorTron Base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Sepsis Diagnosis and Treatment
