PRAXIS: Integrating Program Analysis with Observability for Root-Cause Analysis
Shengkun Cui, Rahul Krishna, Saurabh Jha, Ravishankar K. Iyer

TL;DR
PRAXIS is a novel system that combines program analysis and observability to improve root-cause analysis accuracy and efficiency in cloud incident diagnosis.
Contribution
It introduces an LLM-driven orchestrator that integrates service dependency and code dependence graphs for effective root-cause analysis.
Findings
PRAXIS improves RCA accuracy by up to 6.3x compared to ReAct baselines.
PRAXIS reduces token consumption by 5.3x.
Demonstrated on 30 real-world incidents, forming a new RCA benchmark.
Abstract
Unresolved production cloud incidents cost an average of over $2M per hour. This paper introduces PRAXIS, an orchestrator that manages and deploys an agentic workflow for diagnosing code- and configuration-caused cloud incidents. PRAXIS employs an LLM-driven structured traversal over two types of graph: (1) a service dependency graph (SDG) that captures microservice-level dependencies; and (2) a hammock-block program dependence graph (PDG) that captures code-level dependencies for each microservice. Compared to state-of-the-art ReAct baselines, PRAXIS improves RCA accuracy by up to 6.3x while reducing token consumption by 5.3x. PRAXIS is demonstrated on a set of 30 comprehensive real-world incidents that is being compiled into an RCA benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
