Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang

TL;DR
This paper investigates automating feature preprocessing for tabular data using hyperparameter optimization and neural architecture search, evaluating 15 algorithms across 45 datasets to identify effective strategies.
Contribution
It models Auto-FP as HPO/NAS problems, extends existing algorithms to Auto-FP, and provides a comprehensive evaluation revealing the effectiveness of evolution-based and random search methods.
Findings
Evolution-based algorithms perform best overall.
Random search is a surprisingly strong baseline.
Many advanced algorithms do not outperform random search.
Abstract
Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make difficult decisions about which preprocessors to select and in which order to compose them. In this paper, we study how to automate feature preprocessing (Auto-FP) for tabular data. Due to the large search space, a brute-force solution is prohibitively expensive. To address this challenge, we interestingly observe that Auto-FP can be modelled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem. This observation enables us to extend a variety of HPO and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
