TL;DR
This paper introduces a comprehensive benchmark for Graph Anomaly Detection (GAD) that evaluates models under realistic large-scale, scarce anomaly, and incomplete data challenges, highlighting limitations of current methods.
Contribution
It provides a multi-dimensional benchmark with diverse datasets and a systematic evaluation revealing key limitations of existing GAD models in real-world scenarios.
Findings
Most GNN-based methods cannot scale to million-node graphs due to memory constraints.
Detection performance drops sharply with realistic anomaly ratios, often to zero recall.
Reconstruction-based models are highly sensitive to attribute imputation strategies.
Abstract
Graph Anomaly Detection (GAD) is a critical task in graph machine learning with vital applications in financial fraud detection and social platform governance. However, existing GAD benchmarks are often restricted to small-scale, curated graphs with relatively balanced anomaly ratios, leaving a substantial gap between academic evaluation and real-world deployment. To bridge this gap, we present a multi-dimensional benchmark that systematically evaluates GAD models under three deployment-relevant challenges: million-scale graphs, extreme anomaly scarcity, and missing node attributes. We derive a family of controlled benchmark variants from five diverse graphs, including two native industrial-scale datasets with over 3.7 million nodes. Our extensive evaluation of nine representative GAD models reveals three major limitations: (1) most GNN-based methods fail to scale to million-node graphs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
