Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

Meng Xiao; Junfeng Zhou; Yuanchun Zhou

arXiv:2507.03498·cs.LG·July 10, 2025

Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

Meng Xiao, Junfeng Zhou, Yuanchun Zhou

PDF

Open Access

TL;DR

This paper introduces MAFG, a reinforcement learning-based multi-agent framework that automates high-order feature generation for scientific data, improving model performance without extensive domain expertise.

Contribution

It proposes a novel multi-agent reinforcement learning approach combined with large language models to automate and optimize feature generation in scientific data analysis.

Findings

01

Automates feature generation process effectively.

02

Significantly improves downstream model performance.

03

Reduces reliance on domain-specific expertise.

Abstract

Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification