Statistical Test for Auto Feature Engineering by Selective Inference
Tatsuya Matsukawa, Tomohiro Shiraishi, Shuichi Nishino, Teruyuki, Katsuoka, Ichiro Takeuchi

TL;DR
This paper introduces a statistical test based on selective inference to evaluate the reliability of features generated by auto feature engineering algorithms, providing a way to control false discoveries.
Contribution
It proposes a novel statistical testing framework for features from AFE algorithms, enabling significance assessment with theoretical guarantees.
Findings
The test quantifies feature significance with p-values.
It offers controlled false discovery risk.
Applicable to tree search-based AFE algorithms.
Abstract
Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines by automating the transformation of raw data into meaningful features that enhance model performance. By generating features in a data-driven manner, AFE enables the discovery of important features that may not be apparent through human experience or intuition. On the other hand, since AFE generates features based on data, there is a risk that these features may be overly adapted to the data, making it essential to assess their reliability appropriately. Unfortunately, because most AFE problems are formulated as combinatorial search problems and solved by heuristic algorithms, it has been challenging to theoretically quantify the reliability of generated features. To address this issue, we propose a new statistical test for generated features by AFE algorithms based on a framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Statistical Methods and Models
