The Multi-Agent Fault Localization System Based on Monte Carlo Tree Search Approach
Rui Ren

TL;DR
This paper introduces KnowledgeMind, a multi-agent system using Monte Carlo Tree Search and a reward mechanism to improve fault localization in microservices, significantly reducing context requirements and hallucinations, with substantial accuracy gains.
Contribution
The paper presents a novel multi-agent LLM system with Monte Carlo Tree Search and a reward mechanism, reducing context size and hallucinations while improving localization accuracy.
Findings
Achieves 49.29% to 128.35% improvement in localization accuracy.
Reduces context window size to one-tenth of existing methods.
Effectively mitigates hallucinations during inference.
Abstract
In real-world scenarios, due to the highly decoupled and flexible nature of microservices, it poses greater challenges to system reliability. The more frequent occurrence of incidents has created a demand for Root Cause Analysis(RCA) methods that enable rapid identification and recovery of incidents. Large language model (LLM) provides a new path for quickly locating and recovering from incidents by leveraging their powerful generalization ability combined with expert experience. Current LLM for RCA frameworks are based on ideas like ReAct and Chain-of-Thought, but the hallucination of LLM and the propagation nature of anomalies often lead to incorrect localization results. Moreover, the massive amount of anomalous information generated in large, complex systems presents a huge challenge for the context window length of LLMs. To address these challenges, we propose KnowledgeMind, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
