No Need to Train Your RDB Foundation Model
Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, David Wipf

TL;DR
This paper introduces a novel approach to leverage in-context learning with foundation models for relational databases, enabling prediction across multiple tables without retraining, by using a principled RDB encoder that is scalable and easy to implement.
Contribution
It presents a theoretically grounded, train-free RDB encoder compatible with existing ICL foundation models, facilitating multi-table predictive tasks without additional training.
Findings
Encoder preserves expressiveness without trainable parameters.
The approach achieves robust performance on unseen datasets.
SQL primitives enable scalable implementation.
Abstract
Relational databases (RDBs) contain vast amounts of heterogeneous tabular information that can be exploited for predictive modeling purposes. But since the space of potential targets is vast across enterprise settings, how can we \textit{avoid retraining} a new model each time we wish to predict a new quantity of interest? Foundation models based on in-context learning (ICL) offer a convenient option, but so far are largely restricted to single-table operability. In generalizing to multiple interrelated tables, it is essential to compress variably-sized RDB neighborhoods into fixed-length ICL samples for consumption by the decoder. However, the details here are critical: unlike existing supervised learning RDB pipelines, we provide theoretical and empirical evidence that ICL-specific compression should be constrained \emph{within} high-dimensional RDB columns where all entities share…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
