CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization
Beicheng Xu, Keyao Ding, Wei Liu, Yupeng Lu, Bin Cui

TL;DR
CoFEH is an innovative AutoML framework that combines LLM-driven feature engineering with Bayesian hyperparameter optimization, enabling flexible, end-to-end model tuning.
Contribution
It introduces a collaborative, interleaved approach with mutual conditioning between LLM-based FE and Bayesian HPO, advancing beyond isolated subtask methods.
Findings
CoFEH outperforms traditional and LLM-based baselines in FE and joint FE+HPO tasks.
The mutual conditioning mechanism improves decision-making in feature engineering and hyperparameter tuning.
Dynamic optimizer selection enhances the robustness of the AutoML pipeline.
Abstract
Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which operate within rigid search spaces and lack domain awareness. While Large Language Models (LLMs) offer a promising alternative to generate unbounded operators with semantic reasoning, existing methods focus on isolated subtasks such as feature generation, falling short of free-form FE pipelines. Moreover, they are rarely coupled with hyperparameter optimization (HPO) of the downstream ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (TOT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Machine Learning and Algorithms
