Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims
Fan Ma, Yuntian Liu, Xiang Lan, Weipeng Zhou, Jun Ni, Mauro Giuffr\`e, Lingfei Qian, Xueqing Peng, Yujia Zhou, Ruey-Ling Weng, Huan He, Lu Li, Huiyuan Wang, Qingyu Chen, Andrew Loza, Laila Rasmy, Degui Zhi, Yuan Lu, Chenjie Zeng, Joshua C Denny, Lee Schwamm, Daniella Meeker

TL;DR
This paper introduces ReClaim, a large-scale generative transformer trained on medical claims data, demonstrating improved disease prediction, expenditure forecasting, and real-world evidence generation across diverse healthcare tasks.
Contribution
ReClaim is the first large-scale healthcare foundation model trained on nationwide claims data, outperforming existing models in disease prediction and real-world evidence applications.
Findings
ReClaim achieved a mean AUC of 75.6% on disease-onset prediction tasks.
Scaling the model improved performance monotonically and added significant gains over pre-training.
ReClaim enhanced healthcare expenditure forecasting and reduced bias in target trial emulation.
Abstract
Evidence derived from large-scale real-world data (RWD) is increasingly informing regulatory evaluation and healthcare decision-making. Administrative claims provide population-scale, longitudinal records of healthcare utilization, expenditure, and detailed coding of diagnoses, procedures, and medications, yet their potential as a substrate for healthcare foundation models remains largely unexplored. Here we present ReClaim, a generative transformer trained from scratch on 43.8 billion medical events from more than 200 million enrollees in the MarketScan claims data spanning 2008-2022. ReClaim models longitudinal trajectories across diagnoses, procedures, medications, and expenditure, and was scaled to 140 million, 700 million, and 1.7 billion parameters. Across over 1,000 disease-onset prediction tasks, ReClaim achieved a mean AUC of 75.6%, substantially outperforming disease-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
