Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models
Aditya Tanna, Mitul Solanki, Mohamed Bouadi, Nassim Bouarour, Pratinav Seth, Vinay Kumar Sankarapu

TL;DR
This paper demonstrates that in credit risk prediction with tabular models, how data is sampled and presented (context construction) significantly impacts performance more than the choice of model architecture.
Contribution
It highlights the importance of resampling strategies for context construction in improving TFM performance on imbalanced credit datasets.
Findings
Balanced sampling improves AUC by 3-4 points over uniform sampling.
Strong TFMs with proper context can match classical models trained on full data.
Context strategy explains more variance in performance than model family.
Abstract
Credit default prediction is a tabular learning problem with severe class imbalance, heterogeneous features, and tight latency budgets. Tabular Foundation Models (TFMs) approach this problem through in-context learning, which makes their predictions sensitive to how the context window is built. We benchmark four classical models and five TFMs on the Home Credit and Lending Club datasets, varying the context-construction strategy (seven options) and the context size (1K to 50K). On both datasets, the choice of context strategy explains more variance in AUC-ROC than the choice of TFM family: balanced and hybrid sampling add 3 to 4 AUC points over uniform sampling, and the gap exceeds the spread between TFMs. With a balanced context of 5K to 10K examples, the strongest TFMs reach the AUC of classical baselines trained on the full data, while also recovering meaningful default-class recall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
