TL;DR
This paper rigorously compares deep learning models to XGBoost for tabular data, finding XGBoost generally outperforms deep models in accuracy and tuning efficiency, but ensembles can improve results.
Contribution
It provides a comprehensive comparison showing that traditional tree models like XGBoost outperform recent deep models on tabular data, highlighting the importance of model selection and ensembling.
Findings
XGBoost outperforms deep models in accuracy across datasets.
XGBoost requires less hyperparameter tuning.
Ensembling deep models with XGBoost improves performance.
Abstract
A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression problems with tabular data. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use cases. This paper explores whether these deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. In addition to systematically comparing their performance, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
