KumoRFM-2: Scaling Foundation Models for Relational Learning
Valter Hudovernik, Federico L\'opez, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, Matthias Fey

TL;DR
KumoRFM-2 is a scalable, pre-trained foundation model for relational data that excels in in-context learning and fine-tuning, outperforming supervised methods on benchmarks and handling billion-scale datasets.
Contribution
Introduces KumoRFM-2, a novel relational foundation model that processes connected tables without flattening, leveraging synthetic and real data for superior performance.
Findings
Outperforms supervised approaches by up to 8% on benchmarks.
Surpasses previous models in cold start and noisy data scenarios.
Scales to billion-scale relational datasets.
Abstract
We introduce KumoRFM-2, the next iteration of a pre-trained foundation model for relational data. KumoRFM-2 supports in-context learning as well as fine-tuning and is applicable to a wide range of predictive tasks. In contrast to tabular foundation models, KumoRFM-2 natively operates on relational data, processing one or more connected tables simultaneously without manual table flattening or target variable generation, all while preserving temporal consistency. KumoRFM-2 leverages a large corpus of synthetic and real-world data to pre-train across four axes: the row and column dimensions at the individual table level, and the foreign key and cross-sample dimensions at the database level. In contrast to its predecessor, KumoRFM-2 injects task information as early as possible, enabling sharper selection of task-relevant columns and improved robustness to noisy data. Through extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
