TL;DR
This paper critically evaluates the effectiveness of graph classification benchmarks in distinguishing GNN performance, proposing new metrics and methods to better assess dataset quality and guide future benchmark development.
Contribution
It introduces a novel metric for dataset effectiveness, an empirical benchmarking protocol, and a synthetic dataset generation technique to improve graph learning benchmarks.
Findings
Existing benchmarks often fail to distinguish GNNs from simple methods.
The proposed metric aligns with intuitive and prior findings on dataset complexity.
Intrinsic graph properties influence dataset effectiveness and can be manipulated for better benchmarks.
Abstract
Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In response, we first propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We further propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. To the best of our knowledge, our work is the first to thoroughly study and provide an explicit definition for dataset effectiveness in the graph learning area. Through testing across 16 real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
