GraphNetz: Statistical Benchmarking of Graph Neural Networks with Paired Tests and Rank Aggregation

Kleyton da Costa; Bernardo Modenesi

arXiv:2605.09099·cs.CE·May 12, 2026

GraphNetz: Statistical Benchmarking of Graph Neural Networks with Paired Tests and Rank Aggregation

Kleyton da Costa, Bernardo Modenesi

PDF

TL;DR

GraphNetz introduces a comprehensive benchmarking framework for GNNs that emphasizes statistical rigor, providing confidence intervals, paired tests, and rank aggregation to ensure fair and reproducible comparisons.

Contribution

It offers a standardized, statistically principled benchmarking pipeline for GNNs, including multiple datasets, models, and tasks, with automatic statistical reporting.

Findings

01

No significant difference among four canonical GNN encoders at α=0.05

02

Framework supports 63 datasets, 4 task types, 5 GNN architectures

03

Provides reproducible, statistically validated benchmarks for graph learning

Abstract

Graph Neural Networks (GNNs) benchmarks often report single point estimates, even when performance differences are small relative to variation across random seeds, train/test splits, and datasets. Confidence intervals, paired comparisons, multiple-comparison correction, and rank-based aggregation are standard statistical tools, but they are rarely the default output of graph-learning benchmark suites. We introduce GraphNetz, a benchmarking framework whose default output is a structured statistical report rather than a raw accuracy table. GraphNetz currently includes 63 dataset loaders, four task types, and five canonical GNN architectures, while also supporting custom datasets and models. The framework standardizes multi-seed evaluation and automatically returns per-cell confidence intervals, Holm-corrected paired tests, and Friedman-Nemenyi critical-difference diagrams across tasks. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.