SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions
Yin Lin, Bolin Ding, H. V. Jagadish, Jingren Zhou

TL;DR
SMARTFEAT is an automated feature engineering tool that leverages Foundation Models to efficiently generate informative features, reducing computational costs and avoiding exhaustive operator combinations, thus aiding both experts and non-experts.
Contribution
The paper introduces SMARTFEAT, a novel framework that uses Foundation Models and intelligent operator selection to improve automated feature construction for large datasets.
Findings
Effective feature creation using Foundation Models.
Reduced API calls and computational costs.
Improved feature engineering efficiency for large datasets.
Abstract
Before applying data analytics or machine learning to a data set, a vital step is usually the construction of an informative set of features from the data. In this paper, we present SMARTFEAT, an efficient automated feature engineering tool to assist data users, even non-experts, in constructing useful features. Leveraging the power of Foundation Models (FMs), our approach enables the creation of new features from the data, based on contextual information and open-world knowledge. Our method incorporates an intelligent operator selector that discerns a subset of operators, effectively avoiding exhaustive combinations of original features, as is typically observed in traditional automated feature engineering tools. Moreover, we address the limitations of performing data tasks through row-level interactions with FMs, which could lead to significant delays and costs due to excessive API…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Scientific Computing and Data Management
