ELF-Gym: Evaluating Large Language Models Generated Features for Tabular   Prediction

Yanlin Zhang; Ning Li; Quan Gan; Weinan Zhang; David Wipf; Minjie Wang

arXiv:2410.12865·cs.CL·October 18, 2024

ELF-Gym: Evaluating Large Language Models Generated Features for Tabular Prediction

Yanlin Zhang, Ning Li, Quan Gan, Weinan Zhang, David Wipf, Minjie Wang

PDF

Open Access 1 Repo

TL;DR

ELF-Gym introduces a framework for evaluating how well large language models generate features for tabular data, comparing their performance and similarity to human-crafted features across Kaggle datasets.

Contribution

This paper presents ELF-Gym, a novel evaluation framework that assesses LLM-generated features against expert features using performance impact and semantic similarity metrics.

Findings

01

LLMs can semantically capture about 56% of expert features in best cases

02

Implementation-level overlap of LLM features with expert features drops to 13%

03

LLMs often fail on datasets requiring complex feature engineering

Abstract

Crafting effective features is a crucial yet labor-intensive and domain-specific task within machine learning pipelines. Fortunately, recent advancements in Large Language Models (LLMs) have shown promise in automating various data science tasks, including feature engineering. But despite this potential, evaluations thus far are primarily based on the end performance of a complete ML pipeline, providing limited insight into precisely how LLMs behave relative to human experts in feature engineering. To address this gap, we propose ELF-Gym, a framework for Evaluating LLM-generated Features. We curated a new dataset from historical Kaggle competitions, including 251 "golden" features used by top-performing teams. ELF-Gym then quantitatively evaluates LLM-generated features by measuring their impact on downstream model performance as well as their alignment with expert-crafted features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lilyzhangyanlin/ELF-Gym
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification