Simplifying Root Cause Analysis in Kubernetes with StateGraph and LLM
Yong Xiang (1), Charley Peter Chen (2), Liyi Zeng (3), Wei Yin (1), Xin Liu (1), Hu Li (4), Wei Xu (1) ((1) Tsinghua University, (2) Harmonic Inc, (3) Peng Cheng Laboratory, (4) Unaffiliated)

TL;DR
This paper presents SynergyRCA, a novel tool that combines large language models with graph databases to improve root cause analysis in Kubernetes, achieving high accuracy and efficiency in complex, dynamic environments.
Contribution
Introduces SynergyRCA, integrating LLMs with StateGraph and MetaGraph for effective, context-aware root cause analysis in Kubernetes clusters.
Findings
Identifies root causes within about two minutes on average.
Achieves approximately 90% precision in root cause identification.
Successfully detects both known and novel issues.
Abstract
Kubernetes, a notably complex and distributed system, utilizes an array of controllers to uphold cluster management logic through state reconciliation. Nevertheless, maintaining state consistency presents significant challenges due to unexpected failures, network disruptions, and asynchronous issues, especially within dynamic cloud environments. These challenges result in operational disruptions and economic losses, underscoring the necessity for robust root cause analysis (RCA) to enhance Kubernetes reliability. The development of large language models (LLMs) presents a promising direction for RCA. However, existing methodologies encounter several obstacles, including the diverse and evolving nature of Kubernetes incidents, the intricate context of incidents, and the polymorphic nature of these incidents. In this paper, we introduce SynergyRCA, an innovative tool that leverages LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQualitative Comparative Analysis Research
MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance
