Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication
Philip Chung, Christine T Fong, Andrew M Walters, Nima Aghaeepour,, Meliha Yetisgen, Vikas N O'Reilly-Shah

TL;DR
This study evaluates GPT-4 Turbo's ability to predict perioperative risks and outcomes from clinical notes, showing moderate success in classification tasks but poor performance in duration predictions, indicating potential clinical utility.
Contribution
It demonstrates that large language models can assist in perioperative risk stratification and outcome prediction using clinical notes, highlighting their strengths and limitations.
Findings
Achieved F1 scores of 0.50 for ASA classification
Achieved F1 scores of 0.81 for ICU admission
Achieved F1 scores of 0.86 for hospital mortality
Abstract
We investigate whether general-domain large language models such as GPT-4 Turbo can perform risk stratification and predict post-operative outcome measures using a description of the procedure and a patient's clinical notes derived from the electronic health record. We examine predictive performance on 8 different tasks: prediction of ASA Physical Status Classification, hospital admission, ICU admission, unplanned admission, hospital mortality, PACU Phase 1 duration, hospital duration, and ICU duration. Few-shot and chain-of-thought prompting improves predictive performance for several of the tasks. We achieve F1 scores of 0.50 for ASA Physical Status Classification, 0.81 for ICU admission, and 0.86 for hospital mortality. Performance on duration prediction tasks were universally poor across all prompt strategies. Current generation large language models can assist clinicians in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Cardiac, Anesthesia and Surgical Outcomes · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Label Smoothing · Multi-Head Attention · Adam · Dropout · Absolute Position Encodings · Layer Normalization
