Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Jingcheng Wu; Ratan Bahadur Thapa; Mojtaba Nayyeri; Lucas Etteldorf; Max Finkenbeiner; Fabian Leeske; Steffen Staab

arXiv:2605.16085·cs.DB·May 18, 2026

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Jingcheng Wu, Ratan Bahadur Thapa, Mojtaba Nayyeri, Lucas Etteldorf, Max Finkenbeiner, Fabian Leeske, Steffen Staab

PDF

TL;DR

This paper introduces a hybrid model combining language models and graph neural networks to improve relational database understanding, showing promising results compared to traditional methods.

Contribution

It proposes a novel hybrid architecture that integrates a fine-tuned BART encoder with a GNN over relational entity graphs, enhancing relational data modeling.

Findings

01

GNN significantly enriches BART's row embeddings.

02

Achieves ROC-AUC of 67.40 on rel-f1 dataset.

03

Performance is competitive with some supervised baselines.

Abstract

Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.