RelAgent: LLM Agents as Data Scientists for Relational Learning
Xingyue Huang, Louis Tichelman, Jinwoo Kim, Krzysztof Olejniczak, \.Ismail \.Ilkan Ceylan

TL;DR
RelAgent leverages large language models to autonomously perform relational learning by constructing SQL feature programs and classical models, enabling scalable, interpretable, and fast predictions.
Contribution
This work introduces RelAgent, an LLM-based autonomous data scientist that combines SQL-based feature construction with classical models for relational learning.
Findings
RelAgent produces interpretable, SQL-based features.
The approach enables scalable deployment using standard databases.
RelAgent achieves fast, deterministic predictions.
Abstract
Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., large language models), each with its own advantages and limitations. We propose RelAgent, an LLM-based autonomous data scientist for relational learning, which operates in two phases. In the search phase, an LLM agent uses database, validation, and evaluation workspace tools to construct SQL feature programs and select a predictive model. In the inference phase, the resulting program is executed without further LLM calls. The final predictor consists of SQL queries and a classical model, enabling fast, deterministic, and intrinsically interpretable predictions: features are human-readable queries, and predictions depend only on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
