Toward Efficient Automated Feature Engineering

Kafeng Wang; Pengyang Wang; Chengzhong xu

arXiv:2212.13152·cs.LG·December 27, 2022

Toward Efficient Automated Feature Engineering

Kafeng Wang, Pengyang Wang, Chengzhong xu

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based framework for automated feature engineering that significantly improves efficiency and maintains high effectiveness, enabling faster deployment on large-scale datasets.

Contribution

The work proposes a novel AFE framework with a feature pre-evaluation model and a two-stage policy training strategy to enhance efficiency without sacrificing performance.

Findings

01

Achieved 2.9% higher average performance

02

Doubled the computational efficiency compared to existing methods

03

Validated on 36 datasets for classification and regression

Abstract

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment. Therefore, in this work, we propose a generic framework to improve the efficiency of AFE. Specifically, we construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation \com{and} selection, and the evaluation score of the produced features in downstream tasks serve as the reward to update the policy. We improve the efficiency of AFE in two perspectives. On the one hand, we develop a Feature Pre-Evaluation (FPE) Model to reduce the sample size and feature size that are two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Software Engineering Research · Anomaly Detection Techniques and Applications