Data-driven root-cause analysis for distributed system anomalies
Chao Liu, Kin Gwn Lore, Soumik Sarkar

TL;DR
This paper introduces a novel data-driven framework for root-cause analysis in complex distributed cyber-physical systems, utilizing symbolic dynamics and deep learning to improve accuracy and scalability in fault diagnosis.
Contribution
It proposes two new approaches, S^3 and A^3, for root-cause analysis that outperform traditional models and handle multiple operating modes effectively.
Findings
High accuracy in root-cause analysis demonstrated on synthetic data.
Effective handling of multiple nominal operation modes.
Scalable framework maintaining accuracy with system complexity.
Abstract
Modern distributed cyber-physical systems encounter a large variety of anomalies and in many cases, they are vulnerable to catastrophic fault propagation scenarios due to strong connectivity among the sub-systems. In this regard, root-cause analysis becomes highly intractable due to complex fault propagation mechanisms in combination with diverse operating modes. This paper presents a new data-driven framework for root-cause analysis for addressing such issues. The framework is based on a spatiotemporal feature extraction scheme for distributed cyber-physical systems built on the concept of symbolic dynamics for discovering and representing causal interactions among subsystems of a complex system. We present two approaches for root-cause analysis, namely the sequential state switching (, based on free energy concept of a Restricted Boltzmann Machine, RBM) and artificial anomaly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Scientific Computing and Data Management · Software System Performance and Reliability
