GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges

Jingjing Zhou; Shiyu Huang; Qing Qing; Zuquan Yuan; Huafei Huang; Ziqi Xu; Mingliang Hou; Xikun Zhang; Renqiang Luo; Ivan Lee

arXiv:2605.07133·cs.LG·May 11, 2026

GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges

Jingjing Zhou, Shiyu Huang, Qing Qing, Zuquan Yuan, Huafei Huang, Ziqi Xu, Mingliang Hou, Xikun Zhang, Renqiang Luo, Ivan Lee

PDF

1 Repo

TL;DR

This paper introduces a comprehensive benchmark for Graph Anomaly Detection (GAD) that evaluates models under realistic large-scale, scarce anomaly, and incomplete data challenges, highlighting limitations of current methods.

Contribution

It provides a multi-dimensional benchmark with diverse datasets and a systematic evaluation revealing key limitations of existing GAD models in real-world scenarios.

Findings

01

Most GNN-based methods cannot scale to million-node graphs due to memory constraints.

02

Detection performance drops sharply with realistic anomaly ratios, often to zero recall.

03

Reconstruction-based models are highly sensitive to attribute imputation strategies.

Abstract

Graph Anomaly Detection (GAD) is a critical task in graph machine learning with vital applications in financial fraud detection and social platform governance. However, existing GAD benchmarks are often restricted to small-scale, curated graphs with relatively balanced anomaly ratios, leaving a substantial gap between academic evaluation and real-world deployment. To bridge this gap, we present a multi-dimensional benchmark that systematically evaluates GAD models under three deployment-relevant challenges: million-scale graphs, extreme anomaly scarcity, and missing node attributes. We derive a family of controlled benchmark variants from five diverse graphs, including two native industrial-scale datasets with over 3.7 million nodes. Our extensive evaluation of nine representative GAD models reveals three major limitations: (1) most GNN-based methods fail to scale to million-node graphs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Benchmark_GAD-E7A3
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.