A critical look at the evaluation of GNNs under heterophily: Are we really making progress?
Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko,, Liudmila Prokhorenkova

TL;DR
This paper critically examines the evaluation of GNNs on heterophilous graphs, revealing dataset flaws, especially data leakage, and demonstrating that standard GNNs often outperform specialized models on better benchmarks.
Contribution
It identifies issues with existing heterophily datasets, proposes new heterophilous graph benchmarks, and shows that standard GNNs perform well on these, challenging prior assumptions.
Findings
Existing datasets have data leakage issues due to duplicate nodes.
Standard GNNs outperform specialized models on new heterophilous benchmarks.
Removing duplicates significantly impacts GNN performance on current datasets.
Abstract
Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it is often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and it is typically assumed that specialized methods are required to achieve strong performance on such graphs. In this work, we challenge this assumption. First, we show that the standard datasets used for evaluating heterophily-specific models have serious drawbacks, making results obtained by using them unreliable. The most significant of these drawbacks is the presence of a large number of duplicate nodes in the datasets Squirrel and Chameleon, which leads to train-test data leakage. We show that removing duplicate nodes strongly affects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTracheal and airway disorders · Tuberous Sclerosis Complex Research
