Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning
Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim,, Jinwoo Shin

TL;DR
This paper introduces OCTree, a novel framework that leverages large language models and decision tree reasoning to improve feature generation in tabular data, surpassing existing automated methods.
Contribution
It presents a new LLM-based feature engineering approach that uses decision trees for iterative feedback, eliminating the need for predefined search spaces and enhancing model performance.
Findings
OCTree outperforms existing automated feature engineering methods.
The framework improves prediction accuracy across diverse benchmarks.
Decision tree reasoning effectively guides feature generation.
Abstract
In tabular prediction tasks, tree-based models combined with automated feature engineering methods often outperform deep learning approaches that rely on learned representations. While these feature engineering techniques are effective, they typically depend on a pre-defined search space and primarily use validation scores for feature selection, thereby missing valuable insights from previous experiments. To address these limitations, we propose a novel tabular learning framework that utilizes large language models (LLMs), termed Optimizing Column feature generator with decision Tree reasoning (OCTree). Our key idea is to leverage the reasoning capabilities of LLMs to identify effective feature generation rules without manually specifying the search space and provide language-based reasoning information highlighting past experiments as feedback for iterative rule improvements. We use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
