Distilling Tabular Foundation Models for Structured Health Data
Aditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay Kumar Sankarapu, Pratinav Seth

TL;DR
This paper demonstrates that knowledge distillation can effectively transfer the predictive performance of large tabular foundation models to lightweight models, enabling faster inference without significant accuracy loss in healthcare data applications.
Contribution
The study introduces a stratified out-of-fold teacher labeling method to prevent context leakage during distillation of TFMs, achieving high-performance lightweight models for health datasets.
Findings
Distilled students retain at least 90% of teacher AUC.
Students run at least 26 times faster on CPU.
Multi-teacher averaging does not always outperform the best single teacher.
Abstract
Tabular foundation models (TFMs) achieve strong performance on health datasets, but their inference cost and infrastructure requirements limit practical use. We study whether their predictive behavior can be transferred to lightweight tabular models through knowledge distillation. Since in-context TFMs condition on the training set at inference time, naive distillation can introduce context leakage; we address this with stratified out-of-fold teacher labeling. Across healthcare datasets, TFM teachers, student families, and several multi-teacher ensembles, we find that distilled students retain at least of teacher AUC, outperforming teachers in some cases, while running at least faster on CPU and preserving calibration and fairness critical for health applications. Moreover, multi-teacher averaging does not consistently improve over the best single teacher.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
