The Oversmoothing Fallacy: A Misguided Narrative in GNN Research
MoonJeong Park, Sunghyun Choi, Jaeseung Heo, Eunhyeok Park, Dongwoo Kim

TL;DR
This paper challenges the common belief that oversmoothing limits deep GNNs, showing that misconceptions have hindered exploration of deeper architectures and that classical solutions can enable deeper GNNs without performance loss.
Contribution
It clarifies misconceptions about oversmoothing in GNNs, demonstrating that it is often confused with vanishing gradients and that deep GNNs can perform well with proper techniques.
Findings
Oversmoothing is often mistaken for vanishing gradients.
Skip connections and normalization enable deep GNNs.
Deep GNNs can perform well without oversmoothing issues.
Abstract
Oversmoothing has been recognized as a main obstacle to building deep Graph Neural Networks (GNNs), limiting the performance. This position paper argues that the influence of oversmoothing has been overstated and advocates for a further exploration of deep GNN architectures. Given the three core operations of GNNs, aggregation, linear transformation, and non-linear activation, we show that prior studies have mistakenly confused oversmoothing with the vanishing gradient, caused by transformation and activation rather than aggregation. Our finding challenges prior beliefs about oversmoothing being unique to GNNs. Furthermore, we demonstrate that classical solutions such as skip connections and normalization enable the successful stacking of deep GNN layers without performance degradation. Our results clarify misconceptions about oversmoothing and shed new light on the potential of deep…
Peer Reviews
Decision·Submitted to ICLR 2026
1.The paper challenges a previous core assumption in GNN research that oversmoothing has been overstated, whereas vanishing gradient is the main issue caused by transformation and activation rather than aggregation. 2. The author did a systematic component analysis by isolating the effects of aggregation, transformation, and activation, and clearly show where oversmoothing actually arises with different combination of components. 3. The experimental results demonstrate that deep GNNs with ba
1.Although the author empirically show the effects of aggregation, transformation, and activation on oversmoothing, the paper provides limited theoretical proofs explaining why aggregation has marginal impact. 2. Skip connections and normalization techniques are wide-known solutions to oversmoothing. While the findings encourage deeper models, the paper provides limited guidance on how to practically design or train them beyond skip connections and normalization.
1. The authors address, albiet not the first time, a common misconception regarding performance degradation in GNNs with depth, i.e. oversmoothing is not the sole culprit and vanishing gradients play a major role in practice. This is a pertinent problem to be highlighted. 2. The cause of confusion between the oversmoothing definitions for GCNs and GATs, and how that has affected commonly used metrics to measure oversmoothing is clarified.
1. There have been previous studies that identify similar reasons, primarily vanishing gradients and training problems, as a crucial factor in degraded performance with GNN depth rather than oversmoothing. [1,2,3,4]. Missing relevant literature should be discussed. In fact, some literature with the same insights as this paper are also already mentioned in the related work. This also challenges the novelty and contribution of the paper. 2. In section 3.2, the authors discuss initialization and
- Interesting study showcasing that over-smoothing is sometimes confused with zero collapsing in related work - Selecting the three base components of a GNN, and showing that aggregation does not play a significant role in over-smoothing
- The main weakness is the following: from my understanding when talking about GNNs it’s not just vanishing gradients. Over-smoothing and over-squashing can still appear with healthy gradients. Fixes like residual/identity mappings, careful init, normalization, JK connections, DropEdge, etc., help optimization and slow over-smoothing, but they don’t fully eliminate over-squashing or the diffusion-limit behavior. So the statement “performance degradation is not a phenomenon specific to GNNs, and
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
