Efficient Generative Prediction for EHR Foundation Models: The SCOPE and REACH Estimators

Luke Solo; Matthew B.A. McDermott; William F. Parker; Bashar Ramadan; Michael C. Burkhart; Brett K. Beaulieu-Jones

arXiv:2602.03730·stat.ML·May 14, 2026

Efficient Generative Prediction for EHR Foundation Models: The SCOPE and REACH Estimators

Luke Solo, Matthew B.A. McDermott, William F. Parker, Bashar Ramadan, Michael C. Burkhart, Brett K. Beaulieu-Jones

PDF

TL;DR

This paper introduces two novel estimators, SCOPE and REACH, that improve efficiency and accuracy in generative EHR models by reducing variance and computational cost during outcome prediction.

Contribution

The paper proposes SCOPE and REACH estimators that leverage next-token probabilities, providing unbiased, variance-reducing methods for generative EHR models, with empirical validation on clinical outcomes.

Findings

01

SCOPE and REACH match 100-sample Monte Carlo accuracy with fewer samples.

02

They achieve 2.5x to 3.4x token reduction, exceeding 80x for rare outcomes.

03

Calibration is preserved across all evaluated outcomes.

Abstract

Generative foundation models trained on tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction via Monte Carlo sampling of simulated future trajectories. However, this approach suffers from three coupled limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational cost, and high sampling variance. We propose two new estimators that leverage next-token probability distributions underutilized by standard Monte Carlo: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH). We prove both are unbiased, that REACH guarantees variance reduction over Monte Carlo for any model and outcome, and that REACH is a Rao-Blackwellization of any naive importance sampling scheme that preserves the non-outcome token distribution. Empirically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.