Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures
Liming Zhou, Ailing Liu, Hongwei Liu, Min He, Heng Zhang

TL;DR
This paper introduces RC-LLM, a novel root cause analysis method leveraging large language models and residual connection structures to effectively analyze complex, large-scale microservice architectures by integrating multi-source telemetry data.
Contribution
The paper proposes a residual-connection-based LLM approach for root cause analysis that models causal dependencies across microservices, improving accuracy and efficiency.
Findings
Achieves high accuracy in root cause localization.
Demonstrates efficiency on microservice datasets.
Effectively models temporal and causal dependencies.
Abstract
Root cause localization remain challenging in complex and large-scale microservice architectures. The complex fault propagation among microservices and the high dimensionality of telemetry data, including metrics, logs, and traces, limit the effectiveness of existing root cause analysis (RCA) methods. In this paper, a residual-connection-based RCA method using large language model (LLM), named RC-LLM, is proposed. A residual-like hierarchical fusion structure is designed to integrate multi-source telemetry data, while the contextual reasoning capability of large language models is leveraged to model temporal and cross-microservice causal dependencies. Experimental results on CCF-AIOps microservice datasets demonstrate that RC-LLM achieves strong accuracy and efficiency in root cause analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
