Statistical Test for Auto Feature Engineering by Selective Inference

Tatsuya Matsukawa; Tomohiro Shiraishi; Shuichi Nishino; Teruyuki; Katsuoka; Ichiro Takeuchi

arXiv:2410.19768·stat.ML·October 29, 2024

Statistical Test for Auto Feature Engineering by Selective Inference

Tatsuya Matsukawa, Tomohiro Shiraishi, Shuichi Nishino, Teruyuki, Katsuoka, Ichiro Takeuchi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a statistical test based on selective inference to evaluate the reliability of features generated by auto feature engineering algorithms, providing a way to control false discoveries.

Contribution

It proposes a novel statistical testing framework for features from AFE algorithms, enabling significance assessment with theoretical guarantees.

Findings

01

The test quantifies feature significance with p-values.

02

It offers controlled false discovery risk.

03

Applicable to tree search-based AFE algorithms.

Abstract

Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines by automating the transformation of raw data into meaningful features that enhance model performance. By generating features in a data-driven manner, AFE enables the discovery of important features that may not be apparent through human experience or intuition. On the other hand, since AFE generates features based on data, there is a risk that these features may be overly adapted to the data, making it essential to assess their reliability appropriately. Unfortunately, because most AFE problems are formulated as combinatorial search problems and solved by heuristic algorithms, it has been challenging to theoretically quantify the reliability of generated features. To address this issue, we propose a new statistical test for generated features by AFE algorithms based on a framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

takeuchi-lab-si-group/statistical_test_for_auto_feature_engineering
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Statistical Methods and Models