Towards Self-Improving Error Diagnosis in Multi-Agent Systems
Jiazheng Li, Emine Yilmaz, Bei Chen, Dieu-Thu Le

TL;DR
This paper presents ErrorProbe, a self-improving framework for diagnosing errors in multi-agent systems using a three-stage process that improves accuracy without requiring expert annotation.
Contribution
ErrorProbe introduces a novel, self-updating diagnostic framework that localizes errors at the step level in multi-agent systems without relying on annotated data.
Findings
ErrorProbe outperforms baselines in step-level error localization.
The verified episodic memory enhances cross-domain transfer.
ErrorProbe effectively identifies responsible agents and error origins.
Abstract
Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or ''LLM-as-a-judge'' paradigms, which struggle to pinpoint decisive error steps within extended contexts. In this paper, we introduce ErrorProbe, a self-improving framework for semantic failure attribution that identifies responsible agents and the originating error step. The framework operates via a three-stage pipeline: (1) operationalizing the MAS failure taxonomy to detect local anomalies, (2) performing symptom-driven backward tracing to prune irrelevant context, and (3) employing a specialized multi-agent team (Strategist, Investigator, Arbiter) to validate error hypotheses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
