Next Stop "NoOps": Enabling Cross-System Diagnostics Through Graph-based Composition of Logs and Metrics
Micha{\l} Zasadzi\'nski, Marc Sol\'e, Alvaro Brandon, Victor, Munt\'es-Mulero, David Carrera

TL;DR
This paper introduces a graph-based framework for cross-system diagnostics that enables knowledge transfer and automated troubleshooting in diverse IT systems, supporting the NoOps paradigm.
Contribution
It presents a novel weighted graph representation for system states that facilitates knowledge transfer and diagnostics across different IT systems.
Findings
Effective knowledge transfer between systems demonstrated
High-quality diagnostics achieved across Spark, Hadoop, Kafka, Cassandra
Graph similarity evaluation enables accurate failure detection
Abstract
Performing diagnostics in IT systems is an increasingly complicated task, and it is not doable in satisfactory time by even the most skillful operators. Systems and their architecture change very rapidly in response to business and user demand. Many organizations see value in the maintenance and management model of NoOps that stands for No Operations. One of the implementations of this model is a system that is maintained automatically without any human intervention. The path to NoOps involves not only precise and fast diagnostics but also reusing as much knowledge as possible after the system is reconfigured or changed. The biggest challenge is to leverage knowledge on one IT system and reuse this knowledge for diagnostics of another, different system. We propose a framework of weighted graphs which can transfer knowledge, and perform high-quality diagnostics of IT systems. We encode…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Data Quality and Management · Software Engineering Research
