ReFuGe: Feature Generation for Prediction Tasks on Relational Databases with LLM Agents
Kyungho Kim, Geon Lee, Juyeon Kim, Dongwon Choi, Shinhwan Kang, and Kijung Shin

TL;DR
ReFuGe introduces an agent-based framework utilizing large language models to generate and select relational features, significantly improving prediction performance on relational database tasks without explicit supervision.
Contribution
This work presents ReFuGe, a novel framework that combines multiple LLM agents for schema selection, feature generation, and filtering to enhance relational prediction tasks.
Findings
ReFuGe outperforms existing methods on RDB benchmarks.
The iterative feedback loop improves feature quality and model accuracy.
The framework effectively handles complex schemas and large feature spaces.
Abstract
Relational databases (RDBs) play a crucial role in many real-world web applications, supporting data management across multiple interconnected tables. Beyond typical retrieval-oriented tasks, prediction tasks on RDBs have recently gained attention. In this work, we address this problem by generating informative relational features that enhance predictive performance. However, generating such features is challenging: it requires reasoning over complex schemas and exploring a combinatorially large feature space, all without explicit supervision. To address these challenges, we propose ReFuGe, an agentic framework that leverages specialized large language model agents: (1) a schema selection agent identifies the tables and columns relevant to the task, (2) a feature generation agent produces diverse candidate features from the selected schema, and (3) a feature filtering agent evaluates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Recommender Systems and Techniques
