Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

Aditya Tanna; Nassim Bouarour; Mohamed Bouadi; Vinay kumar Sankarapu; Pratinav Seth

arXiv:2605.18654·cs.LG·May 19, 2026

Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

Aditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay kumar Sankarapu, Pratinav Seth

PDF

1 Repo

TL;DR

This paper presents a method to distill tabular foundation models into CPU-efficient gradient-boosted trees, achieving near-teacher performance with significant speedups across numerous datasets.

Contribution

It introduces a stratified out-of-fold labeling technique to effectively distill in-context learning teachers into fast, CPU-ready gradient-boosted tree models.

Findings

01

Distilling TFMs into XGBoost achieves 96.5% of teacher AUC.

02

Distillation yields 38x to 860x speedup over teacher models.

03

Teacher rank transfer to students is exact.

Abstract

A fraud scorer needs to answer in under 2 ms. The best tabular foundation models (TFMs) take 151-1,275 ms on GPU. We close this gap by distilling the TFM offline into an XGBoost or CatBoost student that runs natively on CPU. The central obstacle is specific to in-context learning (ICL) teachers: they leak labels when scoring their own training set, so the soft targets collapse to near-one-hot vectors with no inter-class structure left to distill. Stratified out-of-fold (OOF) teacher labeling prevents this. Across 153 classification datasets drawn from TALENT, OpenML-CC18, TabZilla, and TabArena, distilling TabICLv2 into XGBoost gives 0.882 macro-mean AUC (96.5% of teacher AUC) at 1.9 ms on CPU, a 38x to 860x speedup across teacher-student pairs with a statistically significant edge over a tuned CatBoost baseline (Wilcoxon p = 0.0008; 51% win rate). Four further findings: teacher rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.