Are We Measuring Oversmoothing in Graph Neural Networks Correctly?
Kaicheng Zhang, Piero Deidda, Desmond Higham, Francesco Tudisco

TL;DR
This paper critiques traditional oversmoothing metrics in GNNs, proposing rank-based measures as more reliable indicators of model degradation, supported by extensive experiments and theoretical analysis.
Contribution
It introduces a novel rank-based metric for oversmoothing, demonstrating its effectiveness over traditional energy-based measures through comprehensive experiments and theoretical proofs.
Findings
Rank metrics reliably detect oversmoothing across architectures.
Energy-based metrics often fail to indicate performance drops.
Rank collapse correlates with GNN performance degradation.
Abstract
Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. We argue that these metrics have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks, while typical GNNs show a performance drop already with as few as 10 layers. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide extensive numerical evaluation across diverse graph architectures and datasets to show that rank-based metrics consistently capture…
Peer Reviews
Decision·ICLR 2026 Poster
1. This paper provides a strong theoretical proof of the decay of the rank as a measurement of oversmoothing. 2. Extensive experiments and ablation studies show the rank-based metrics correlate better with performance degradation than energy-based measurement with extension to modern architectures with dropout, residuals, or normalization layers. 3. Unlike Dirichlet energy, which only shows meaningful trends for very deep networks, the proposed rank metrics can detect degradation even in mod
1. Research of related work part is not sufficient as many previous works have already shown energy-based metric is not a proper or sufficient measurement of oversmoothing. For example, [1] has shown that oversmoothing can be mitigated without explicitly dirichlet energy based control. 2. The paper shows rank metrics are effective measurement of oversmoothing but does not provide practical techniques to mitigate it using the rank insight. [1] Y. Jin and X. Zhu, "Graph Rhythm Network: Beyond En
The paper's novel critique of widely used metrics is timely and well-supported, exposing flaws like scale-dependence and eigenspace reliance with examples and simplified theorems. Theoretical contributions, including proofs of rank convergence independent of feature magnitude, extend to GCNs/GATs and offer a unifying eigenvector perspective, and some nonlinear analysis. Empirically, the extensive evaluation across homophilic/heterophilic datasets, depths, activations, and ablations robustly demo
The theoretical analysis is restricted to linear/nonnegative models or shared eigenvectors, limiting generalizability to signed graphs or complex activations. Experiments focus solely on node classification, neglecting graph/edge-level tasks or larger/dynamic graphs, and don't prove causation between rank decay and performance. While rank metrics excel in trained settings, the paper somewhat overlooks scenarios where energy metrics succeed (e.g., untrained asymptotics) and lacks discussion of co
The paper is overall clearly written. It clearly points out the failure modes of energy metrics. For example, it shows these metrics are informative mainly in the limit, can be confounded by scaling, and rely on a fixed dominant eigenspace, which often violated by trained GNNs. It also provides a unifying theoretical perspective for previous results on the oversmoothing phenomena measured by Dirichlet-like energies. The nonlinear case with Hilbert-metric argument are technically interesting.
Despite strong empirical results, the paper lacks a theoretical justification that rank‑based metrics dominate Dirichlet‑style energy measures as oversmoothing indicators in trained GNNs. The analysis shows when energies vanish and that ranks can collapse, but offers no formal comparison, dominance result, or sensitivity guarantee for rank.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
