TL;DR
This paper introduces GRL-Safety, a comprehensive benchmark for evaluating the safety of graph representation learning methods across multiple stress factors and safety axes, revealing nuanced safety behaviors and capability gaps.
Contribution
It presents a multi-axis safety evaluation benchmark for diverse GRL methods, providing detailed insights into safety performance under various deployment stresses.
Findings
Safety behavior depends on representation design and graph stress factors.
Foundation-era methods have axis-specific strengths, not broad safety superiority.
Some deployment scenarios remain challenging even for top-performing methods.
Abstract
Graph representation learning (GRL) has evolved from topology-only graph embeddings to task-specific supervised GNNs, and more recently to reusable representations and graph foundation models (GFMs). However, existing evaluations mainly measure clean transfer, adaptation, and task coverage. It remains unclear whether GRL methods stay reliable when deployment stresses affect graph signals, graph contexts, label support, structural groups, or predictive evidence. We introduce GRL-Safety, a multi-axis safety evaluation benchmark for GRL. GRL-Safety evaluates twelve representative methods, spanning topology-only embedding methods, supervised GNNs, self-supervised graph models, and GFMs, on twenty-five graph datasets under standardized evaluation conditions while preserving method-native adaptation. The evaluation covers five safety axes: corruption robustness, OOD generalization, class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
