Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures

Liming Zhou; Ailing Liu; Hongwei Liu; Min He; Heng Zhang

arXiv:2602.08804·cs.AI·February 10, 2026

Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures

Liming Zhou, Ailing Liu, Hongwei Liu, Min He, Heng Zhang

PDF

Open Access

TL;DR

This paper introduces RC-LLM, a novel root cause analysis method leveraging large language models and residual connection structures to effectively analyze complex, large-scale microservice architectures by integrating multi-source telemetry data.

Contribution

The paper proposes a residual-connection-based LLM approach for root cause analysis that models causal dependencies across microservices, improving accuracy and efficiency.

Findings

01

Achieves high accuracy in root cause localization.

02

Demonstrates efficiency on microservice datasets.

03

Effectively models temporal and causal dependencies.

Abstract

Root cause localization remain challenging in complex and large-scale microservice architectures. The complex fault propagation among microservices and the high dimensionality of telemetry data, including metrics, logs, and traces, limit the effectiveness of existing root cause analysis (RCA) methods. In this paper, a residual-connection-based RCA method using large language model (LLM), named RC-LLM, is proposed. A residual-like hierarchical fusion structure is designed to integrate multi-source telemetry data, while the contextual reasoning capability of large language models is leveraged to model temporal and cross-microservice causal dependencies. Experimental results on CCF-AIOps microservice datasets demonstrate that RC-LLM achieves strong accuracy and efficiency in root cause analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · IoT and Edge/Fog Computing