Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Yanbo Wang; Jiaxuan You; Chuan Shi; Muhan Zhang

arXiv:2603.03805·cs.LG·May 12, 2026

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Yanbo Wang, Jiaxuan You, Chuan Shi, Muhan Zhang

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces RDB-PFN, a synthetic-data trained relational foundation model that excels at in-context learning for real-world relational tasks, overcoming data scarcity issues.

Contribution

The authors develop RDB-PFN, the first relational foundation model trained solely on synthetic data, enabling instant adaptation to new databases through in-context learning.

Findings

01

RDB-PFN achieves strong few-shot performance on 19 real-world tasks.

02

It outperforms graph-based and single-table baselines with the same input format.

03

The model uses a lightweight architecture and offers fast inference.

Abstract

Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making internet-scale pre-training infeasible. To overcome this data scarcity, We introduce $RDB-PFN$ , the first relational foundation model trained purely via $synthetic data$ . Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables, we design a $Relational Prior Generator$ to create an infinite stream of diverse RDBs from scratch. Pre-training on $over 2 million$ synthetic single-table and relational tasks, RDB-PFN learns to adapt to any new database instantly via genuine $in-context learning$ . Experiments verify RDB-PFN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MuLabPKU/RDBPFN
github

Datasets

yamboo/RDB_PFN
dataset· 1.5k dl
1.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.