Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis
Rohan Kumar, Jason Li, Zongshun Zhang, Syed Mohammad Qasim, Gianluca Stringhini, Ayse K. Coskun

TL;DR
Praxium is a framework that leverages AI-based telemetry and dependency analysis to detect cloud microservice anomalies and accurately identify their root causes, improving diagnosis speed and reliability.
Contribution
The paper introduces Praxium, a novel system combining anomaly detection and causal impact analysis for cloud microservices, enhancing diagnosis accuracy and efficiency.
Findings
Anomaly detection achieves >0.97 macro-F1 across synthetic tests.
Causal impact analysis reliably identifies root causes.
Effective hyperparameter tuning improves practical deployment.
Abstract
As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Software Reliability and Analysis Research
