TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications
David Salinas, Nick Erickson

TL;DR
TabRepo is a comprehensive dataset of tabular model evaluations that enables advanced analysis, transfer-learning, and improvements over current AutoML systems in accuracy and efficiency.
Contribution
We present TabRepo, a large-scale repository of model evaluations that facilitates analysis, transfer-learning, and enhances AutoML performance.
Findings
Enables comparison of Hyperparameter Optimization and AutoML systems.
Facilitates transfer-learning to outperform state-of-the-art systems.
Improves accuracy, runtime, and latency through transfer-learning techniques.
Abstract
We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at marginal cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Model-Driven Software Engineering Techniques
