TabDistill: Distilling Transformers into Neural Nets for Few-Shot Tabular Classification
Pasan Dissanayake, Sanghamitra Dutta

TL;DR
TabDistill is a novel method that distills knowledge from complex transformer models into simpler neural networks, achieving high performance on few-shot tabular classification while reducing model complexity.
Contribution
The paper introduces TabDistill, a framework for distilling transformer knowledge into neural networks, balancing parameter efficiency and few-shot learning performance.
Findings
Distilled neural networks outperform classical baselines with limited data.
Distilled models sometimes surpass original transformer models.
Framework achieves parameter efficiency and strong few-shot performance.
Abstract
Transformer-based models have shown promising performance on tabular data compared to their classical counterparts such as neural networks and Gradient Boosted Decision Trees (GBDTs) in scenarios with limited training data. They utilize their pre-trained knowledge to adapt to new domains, achieving commendable performance with only a few training examples, also called the few-shot regime. However, the performance gain in the few-shot regime comes at the expense of significantly increased complexity and number of parameters. To circumvent this trade-off, we introduce TabDistill, a new strategy to distill the pre-trained knowledge in complex transformer-based models into simpler neural networks for effectively classifying tabular data. Our framework yields the best of both worlds: being parameter-efficient while performing well with limited training data. The distilled neural networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
