Distilling Tabular Foundation Models for Structured Health Data

Aditya Tanna; Nassim Bouarour; Mohamed Bouadi; Vinay Kumar Sankarapu; Pratinav Seth

arXiv:2605.18702·cs.LG·May 19, 2026

Distilling Tabular Foundation Models for Structured Health Data

Aditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay Kumar Sankarapu, Pratinav Seth

PDF

TL;DR

This paper demonstrates that knowledge distillation can effectively transfer the predictive performance of large tabular foundation models to lightweight models, enabling faster inference without significant accuracy loss in healthcare data applications.

Contribution

The study introduces a stratified out-of-fold teacher labeling method to prevent context leakage during distillation of TFMs, achieving high-performance lightweight models for health datasets.

Findings

01

Distilled students retain at least 90% of teacher AUC.

02

Students run at least 26 times faster on CPU.

03

Multi-teacher averaging does not always outperform the best single teacher.

Abstract

Tabular foundation models (TFMs) achieve strong performance on health datasets, but their inference cost and infrastructure requirements limit practical use. We study whether their predictive behavior can be transferred to lightweight tabular models through knowledge distillation. Since in-context TFMs condition on the training set at inference time, naive distillation can introduce context leakage; we address this with stratified out-of-fold teacher labeling. Across $19$ healthcare datasets, $6$ TFM teachers, $4$ student families, and several multi-teacher ensembles, we find that distilled students retain at least $90%$ of teacher AUC, outperforming teachers in some cases, while running at least $26 \times$ faster on CPU and preserving calibration and fairness critical for health applications. Moreover, multi-teacher averaging does not consistently improve over the best single teacher.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.