The Final Layer Holds the Key: A Unified and Efficient GNN Calibration Framework
Jincheng Huang, Jie Xu, Xiaoshuang Shi, Ping Hu, Lei Feng, Xiaofeng Zhu

TL;DR
This paper introduces a unified, efficient GNN calibration framework that improves confidence estimates by focusing on class-centroid and node-level calibration, reducing computational overhead and enhancing reliability.
Contribution
It provides a theoretical framework linking confidence calibration to class-centroid and node-level adjustments, proposing a simple method to improve GNN calibration without extra components.
Findings
Reduces GNN under-confidence by lowering weight decay in final layer
Node-level calibration improves confidence at a finer granularity
Method outperforms existing calibration techniques in experiments
Abstract
Graph Neural Networks (GNNs) have demonstrated remarkable effectiveness on graph-based tasks. However, their predictive confidence is often miscalibrated, typically exhibiting under-confidence, which harms the reliability of their decisions. Existing calibration methods for GNNs normally introduce additional calibration components, which fail to capture the intrinsic relationship between the model and the prediction confidence, resulting in limited theoretical guarantees and increased computational overhead. To address this issue, we propose a simple yet efficient graph calibration method. We establish a unified theoretical framework revealing that model confidence is jointly governed by class-centroid-level and node-level calibration at the final layer. Based on this insight, we theoretically show that reducing the weight decay of the final-layer parameters alleviates GNN…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This is the paper's most significant strength. It moves beyond heuristic-based calibration by providing a rigorous theoretical analysis. 2. The proposed SCAR method consistently outperforms a wide range of strong baselines across multiple datasets.
1. The node-level calibration is refined in Eq. 10 to account for the structural bias of GNNs (nodes closer to training data get more similar representations). While this is a thoughtful addition, its evaluation is limited. An ablation study showing the performance gain of using two parameters $\alpha$ and $\beta$ over a single one would have strengthened this claim. 2. The details of the high-order neighbors of the training node is not well specified. 3. Sensitivity analysis on hyper-paramete
1. The authors are the first to theoretically show that final-layer weight decay aggravates GNN under-confidence, and they mitigate this by reducing the decay. 2. They propose a training-free node-level calibration method as a fine-grained complement to class-centroid-level calibration. 3. They develop a unified theoretical framework showing that both calibration levels jointly govern model confidence, and validate the method’s superiority across diverse settings.
1. Missing important related work: Given that the paper focuses on confidence calibration, it is concerning that several key papers in the area of uncertainty estimation or calibration for GNNs are not cited or discussed [1-4]. 2. Limited baselines: The experimental comparisons would benefit from the inclusion of recent calibration methods [5] 3. Restricted backbone models: The authors only evaluate their method on GCN and GAT. While these are classical models, they are no longer sufficient to
- This paper provides a theoretical connection between underconfidence of GNNs and final layer’s weight decay, which is valuable given the lack of theoretical analysis in GNN calibration literature. - The proposed method is simple yet effective, avoiding the need to train additional calibration networks as required by many existing methods. - Extensive experiments shows that SCAR substantially reduces ECE compared to prior baselines, as well as maintaining original classification accuracy of GNN
- The proposed node-level calibration assumes that pushing test nodes toward their predicted class centroids improves confidence, which may not hold under settings such as out-of-distribution (OOD) conditions. For instance, in OOD graphs, pushing test nodes toward centroids learned from training data can degrade calibration. - If the original GNNs are trained with zero weight decay, the proposed method may be partially inapplicable. - While SCAR is efficient, it needs to search the optimal confi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Machine Learning in Healthcare
MethodsWeight Decay
