Towards In-Depth Root Cause Localization for Microservices with Multi-Agent Recursion-of-Thought
Lingzhe Zhang, Tong Jia, Kangjin Wang, Chiming Duan, Minghua He, Rongqian Wang, Xi Peng, Meiling Wang, Gong Zhang, Renhai Chen, Ying Li

TL;DR
This paper introduces RCLAgent, a multi-agent recursive reasoning framework that improves root cause localization in complex microservice systems by enhancing accuracy and efficiency through parallel, trace graph-based analysis.
Contribution
The paper proposes a novel multi-agent recursion-of-thought approach for microservice root cause localization, addressing limitations of context explosion and serial reasoning in existing LLM-based methods.
Findings
RCLAgent outperforms state-of-the-art methods in localization accuracy.
RCLAgent achieves higher inference efficiency.
Extensive experiments validate the effectiveness of RCLAgent.
Abstract
As modern microservice systems grow increasingly complex due to dynamic interactions and evolving runtime environments, they experience failures with rising frequency. Ensuring system reliability therefore critically depends on accurate root cause localization (RCL). While numerous traditional machine learning and deep learning approaches have been explored for this task, they often suffer from limited interpretability and poor transferability across deployments. More recently, large language model (LLM)-based methods have been proposed to address these issues. However, existing LLM-based approaches still face two fundamental limitations: context explosion, which dilutes critical evidence and degrades localization accuracy, and serial reasoning structures, which hinder deep causal exploration and impair inference efficiency. In this paper, we conduct a comprehensive study of both how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
