Learned Feature Importance Scores for Automated Feature Engineering

Yihe Dong; Sercan Arik; Nathanael Yoder; Tomas Pfister

arXiv:2406.04153·cs.LG·June 7, 2024

Learned Feature Importance Scores for Automated Feature Engineering

Yihe Dong, Sercan Arik, Nathanael Yoder, Tomas Pfister

PDF

Open Access

TL;DR

AutoMAN is an automated feature engineering framework that learns feature importance masks end-to-end, enabling high accuracy and low latency in diverse data settings, reducing manual effort in machine learning workflows.

Contribution

This paper introduces AutoMAN, a novel method that automates feature engineering by learning importance masks without explicitly transforming features, applicable to heterogeneous and time-varying data.

Findings

01

Achieves state-of-the-art accuracy in feature engineering tasks.

02

Significantly reduces latency compared to existing methods.

03

Extends to support time series and heterogeneous data modalities.

Abstract

Feature engineering has demonstrated substantial utility for many machine learning workflows, such as in the small data regime or when distribution shifts are severe. Thus automating this capability can relieve much manual effort and improve model performance. Towards this, we propose AutoMAN, or Automated Mask-based Feature Engineering, an automated feature engineering framework that achieves high accuracy, low latency, and can be extended to heterogeneous and time-varying data. AutoMAN is based on effectively exploring the candidate transforms space, without explicitly manifesting transformed features. This is achieved by learning feature importance masks, which can be extended to support other modalities such as time series. AutoMAN learns feature transform importance end-to-end, incorporating a dataset's task target directly into feature engineering, resulting in state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification