Training Large Language Models to Predict Clinical Events

Benjamin Turtel; Paul Wilczewski; Kris Skotheim

arXiv:2605.12817·cs.LG·May 14, 2026

Training Large Language Models to Predict Clinical Events

Benjamin Turtel, Paul Wilczewski, Kris Skotheim

PDF

TL;DR

This paper introduces a method to train large language models on longitudinal clinical notes for predicting patient events, improving calibration and accuracy without hand-engineered features.

Contribution

It extends Foresight Learning to clinical prediction, creating a scalable approach using natural language questions and labels from longitudinal notes.

Findings

01

Improved calibration error from 0.1269 to 0.0398

02

Reduced Brier score from 0.199 to 0.145

03

Outperformed GPT-5 point estimates on held-out questions

Abstract

Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label resolved from later documentation. This process yields 6,900 prediction examples from 702 admissions across medications, procedures, organ support, microbiology, and mortality. A small LoRA adapter trained on these examples improves over the prompted base model, reducing expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145, while slightly outperforming GPT-5 point estimates on held-out questions. The approach enables reusable clinical prediction supervision from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.