Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

Minh-Khoi Pham; Thang-Long Nguyen Ho; Thao Thi Phuong Dao; Tai Tan Mai; Minh-Triet Tran; Marie E. Ward; Una Geary; Rob Brennan; Nick McDonald; Martin Crane; Marija Bezbradica

arXiv:2604.01841·cs.AI·April 3, 2026

Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

Minh-Khoi Pham, Thang-Long Nguyen Ho, Thao Thi Phuong Dao, Tai Tan Mai, Minh-Triet Tran, Marie E. Ward, Una Geary, Rob Brennan, Nick McDonald, Martin Crane, Marija Bezbradica

PDF

TL;DR

This paper evaluates retrieval-aligned tabular foundation models for clinical risk prediction in EHRs, introducing AWARE to improve retrieval quality and robustness under real-world clinical data challenges.

Contribution

It presents AWARE, a novel retrieval framework that enhances clinical prediction accuracy by addressing retrieval quality and alignment issues in tabular in-context learning.

Findings

01

PFN-based TICL models are sample-efficient in low-data regimes.

02

AWARE improves AUPRC by up to 12.2% under extreme imbalance.

03

Retrieval quality and alignment are key bottlenecks for clinical deployment.

Abstract

Clinical prediction from structured electronic health records (EHRs) is challenging due to high dimensionality, heterogeneity, class imbalance, and distribution shift. While tabular in-context learning (TICL) and retrieval-augmented methods perform well on generic benchmarks, their behavior in clinical settings remains unclear. We present a multi-cohort EHR benchmark comparing classical, deep tabular, and TICL models across varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization. PFN-based TICL models are sample-efficient in low-data regimes but degrade under naive distance-based retrieval as heterogeneity and imbalance increase. We propose AWARE, a task-aligned retrieval framework using supervised embedding learning and lightweight adapters. AWARE improves AUPRC by up to 12.2% under extreme imbalance, with gains increasing with data complexity. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.