Reinforcement Learning-based Feature Generation Algorithm for Scientific Data
Meng Xiao, Junfeng Zhou, Yuanchun Zhou

TL;DR
This paper introduces MAFG, a reinforcement learning-based multi-agent framework that automates high-order feature generation for scientific data, improving model performance without extensive domain expertise.
Contribution
It proposes a novel multi-agent reinforcement learning approach combined with large language models to automate and optimize feature generation in scientific data analysis.
Findings
Automates feature generation process effectively.
Significantly improves downstream model performance.
Reduces reliance on domain-specific expertise.
Abstract
Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
