Edge-aware baselines for ogbn-proteins in PyTorch Geometric: species-wise normalization, post-hoc calibration, and cost-accuracy trade-offs
Aleksandar Stankovi\'c, Dejan Lisica

TL;DR
This paper establishes reproducible edge-aware baselines for ogbn-proteins in PyTorch Geometric, exploring system choices like feature aggregation and normalization, and introduces calibration techniques to improve decision quality.
Contribution
It provides a comprehensive analysis of edge aggregation and normalization methods, and proposes post-hoc calibration techniques to enhance model performance and reliability.
Findings
Sum-based edge-to-node features outperform mean and max aggregations.
BatchNorm achieves the highest ROC-AUC, while Conditional LayerNorm matches the AUC frontier.
Post-hoc calibration significantly improves F1 and calibration error with minimal impact on AUC.
Abstract
We present reproducible, edge-aware baselines for ogbn-proteins in PyTorch Geometric (PyG). We study two system choices that dominate practice: (i) how 8-dimensional edge evidence is aggregated into node inputs, and (ii) how edges are used inside message passing. Our strongest baseline is GraphSAGE with sum-based edge-to-node features. We compare LayerNorm (LN), BatchNorm (BN), and a species-aware Conditional LayerNorm (CLN), and report compute cost (time, VRAM, parameters) together with accuracy (ROC-AUC) and decision quality. In our primary experimental setup (hidden size 512, 3 layers, 3 seeds), sum consistently beats mean and max; BN attains the best AUC, while CLN matches the AUC frontier with better thresholded F1. Finally, post-hoc per-label temperature scaling plus per-label thresholds substantially improves micro-F1 and expected calibration error (ECE) with negligible AUC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Cell Image Analysis Techniques · Protein Structure and Dynamics
