TL;DR
GraphLand introduces a diverse industrial graph dataset benchmark to evaluate graph ML models across various real-world applications, addressing the limited scope of existing benchmarks.
Contribution
It provides 14 diverse datasets for node property prediction, enabling comprehensive evaluation of graph models in industrial contexts and exploring the impact of distributional shifts.
Findings
GBDT models with graph features can be strong baselines.
Current graph foundation models underperform on diverse industrial datasets.
GraphLand reveals limitations of existing models in real-world scenarios.
Abstract
Although data that can be naturally represented as graphs is widespread in real-world applications across diverse industries, popular graph ML benchmarks for node property prediction only cover a surprisingly narrow set of data domains, and graph neural networks (GNNs) are often evaluated on just a few academic citation networks. This issue is particularly pressing in light of the recent growing interest in designing graph foundation models. These models are supposed to be able to transfer to diverse graph datasets from different domains, and yet the proposed graph foundation models are often evaluated on a very limited set of datasets from narrow applications. To alleviate this issue, we introduce GraphLand: a benchmark of 14 diverse graph datasets for node property prediction from a range of different industrial applications. GraphLand allows evaluating graph ML models on a wide range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
