Are We Measuring Oversmoothing in Graph Neural Networks Correctly?

Kaicheng Zhang; Piero Deidda; Desmond Higham; Francesco Tudisco

arXiv:2502.04591·cs.LG·February 24, 2026

Are We Measuring Oversmoothing in Graph Neural Networks Correctly?

Kaicheng Zhang, Piero Deidda, Desmond Higham, Francesco Tudisco

PDF

Open Access 3 Reviews

TL;DR

This paper critiques traditional oversmoothing metrics in GNNs, proposing rank-based measures as more reliable indicators of model degradation, supported by extensive experiments and theoretical analysis.

Contribution

It introduces a novel rank-based metric for oversmoothing, demonstrating its effectiveness over traditional energy-based measures through comprehensive experiments and theoretical proofs.

Findings

01

Rank metrics reliably detect oversmoothing across architectures.

02

Energy-based metrics often fail to indicate performance drops.

03

Rank collapse correlates with GNN performance degradation.

Abstract

Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. We argue that these metrics have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks, while typical GNNs show a performance drop already with as few as 10 layers. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide extensive numerical evaluation across diverse graph architectures and datasets to show that rank-based metrics consistently capture…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. This paper provides a strong theoretical proof of the decay of the rank as a measurement of oversmoothing. 2. Extensive experiments and ablation studies show the rank-based metrics correlate better with performance degradation than energy-based measurement with extension to modern architectures with dropout, residuals, or normalization layers. 3. Unlike Dirichlet energy, which only shows meaningful trends for very deep networks, the proposed rank metrics can detect degradation even in mod

Weaknesses

1. Research of related work part is not sufficient as many previous works have already shown energy-based metric is not a proper or sufficient measurement of oversmoothing. For example, [1] has shown that oversmoothing can be mitigated without explicitly dirichlet energy based control. 2. The paper shows rank metrics are effective measurement of oversmoothing but does not provide practical techniques to mitigate it using the rank insight. [1] Y. Jin and X. Zhu, "Graph Rhythm Network: Beyond En

Reviewer 02Rating 4Confidence 2

Strengths

The paper's novel critique of widely used metrics is timely and well-supported, exposing flaws like scale-dependence and eigenspace reliance with examples and simplified theorems. Theoretical contributions, including proofs of rank convergence independent of feature magnitude, extend to GCNs/GATs and offer a unifying eigenvector perspective, and some nonlinear analysis. Empirically, the extensive evaluation across homophilic/heterophilic datasets, depths, activations, and ablations robustly demo

Weaknesses

The theoretical analysis is restricted to linear/nonnegative models or shared eigenvectors, limiting generalizability to signed graphs or complex activations. Experiments focus solely on node classification, neglecting graph/edge-level tasks or larger/dynamic graphs, and don't prove causation between rank decay and performance. While rank metrics excel in trained settings, the paper somewhat overlooks scenarios where energy metrics succeed (e.g., untrained asymptotics) and lacks discussion of co

Reviewer 03Rating 4Confidence 3

Strengths

The paper is overall clearly written. It clearly points out the failure modes of energy metrics. For example, it shows these metrics are informative mainly in the limit, can be confounded by scaling, and rely on a fixed dominant eigenspace, which often violated by trained GNNs. It also provides a unifying theoretical perspective for previous results on the oversmoothing phenomena measured by Dirichlet-like energies. The nonlinear case with Hilbert-metric argument are technically interesting.

Weaknesses

Despite strong empirical results, the paper lacks a theoretical justification that rank‑based metrics dominate Dirichlet‑style energy measures as oversmoothing indicators in trained GNNs. The analysis shows when energies vanish and that ranks can collapse, but offers no formal comparison, dominance result, or sensitivity guarantee for rank.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications