Benchmarking Foundation Models with Multimodal Public Electronic Health Records
Kunyu Yu, Rui Yang, Jingchi Liao, Siqi Li, Huitao Li, Irene Li, Yifan Peng, Rishikesan Kamaleswaran, Nan Liu

TL;DR
This paper introduces a comprehensive benchmark for evaluating foundation models on electronic health records, focusing on performance, fairness, and interpretability across unimodal and multimodal data using the MIMIC-IV dataset.
Contribution
It provides a standardized evaluation pipeline and compares eight models, highlighting the benefits of multimodal data in clinical prediction tasks.
Findings
Multimodal models improve predictive accuracy.
Incorporating multiple modalities does not increase bias.
Benchmark facilitates development of trustworthy AI in healthcare.
Abstract
Foundation models have emerged as a powerful approach for processing electronic health records (EHRs), offering flexibility to handle diverse medical data modalities. In this study, we present a comprehensive benchmark that evaluates the performance, fairness, and interpretability of foundation models, both as unimodal encoders and as multimodal learners, using the publicly available MIMIC-IV database. To support consistent and reproducible evaluation, we developed a standardized data processing pipeline that harmonizes heterogeneous clinical records into an analysis-ready format. We systematically compared eight foundation models, encompassing both unimodal and multimodal models, as well as domain-specific and general-purpose variants. Our findings demonstrate that incorporating multiple data modalities leads to consistent improvements in predictive performance without introducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
