A Cascaded Graph Neural Network for Joint Root Cause Localization and Analysis in Edge Computing Environments
Duneesha Fernando, Maria A. Rodriguez, Rajkumar Buyya

TL;DR
This paper introduces a cascaded graph neural network framework that improves scalability and reduces inference latency for root cause localization and analysis in large, distributed edge computing environments, while maintaining high diagnostic accuracy.
Contribution
It proposes a novel cascaded GNN architecture with communication-driven clustering to enable scalable, low-latency root cause analysis in complex edge computing microservice systems.
Findings
Achieves comparable accuracy to centralized GNNs
Maintains near-constant inference latency with increasing graph size
Demonstrates effectiveness on MicroCERCL benchmark and large-scale datasets
Abstract
Edge computing environments host increasingly complex microservice-based IoT applications that are prone to performance anomalies propagating across dependent services. Identifying the faulty component (root cause localization) and the underlying fault type (root cause analysis) is essential for timely mitigation. Supervised graph neural networks (GNNs) currently represent the state of the art for joint root cause localization and analysis. However, existing approaches rely on centralized processing over full-system graphs, leading to high inference latency and limited scalability in large, distributed edge environments. In this paper, we propose a cascaded GNN framework for joint RCL and fault type identification that explicitly addresses these scalability challenges. Our approach employs communication-driven clustering to partition large service graphs into highly interacting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · IoT and Edge/Fog Computing · Software-Defined Networks and 5G
