Towards Self-Improving Error Diagnosis in Multi-Agent Systems

Jiazheng Li; Emine Yilmaz; Bei Chen; Dieu-Thu Le

arXiv:2604.17658·cs.MA·April 21, 2026

Towards Self-Improving Error Diagnosis in Multi-Agent Systems

Jiazheng Li, Emine Yilmaz, Bei Chen, Dieu-Thu Le

PDF

TL;DR

This paper presents ErrorProbe, a self-improving framework for diagnosing errors in multi-agent systems using a three-stage process that improves accuracy without requiring expert annotation.

Contribution

ErrorProbe introduces a novel, self-updating diagnostic framework that localizes errors at the step level in multi-agent systems without relying on annotated data.

Findings

01

ErrorProbe outperforms baselines in step-level error localization.

02

The verified episodic memory enhances cross-domain transfer.

03

ErrorProbe effectively identifies responsible agents and error origins.

Abstract

Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or ''LLM-as-a-judge'' paradigms, which struggle to pinpoint decisive error steps within extended contexts. In this paper, we introduce ErrorProbe, a self-improving framework for semantic failure attribution that identifies responsible agents and the originating error step. The framework operates via a three-stage pipeline: (1) operationalizing the MAS failure taxonomy to detect local anomalies, (2) performing symptom-driven backward tracing to prune irrelevant context, and (3) employing a specialized multi-agent team (Strategist, Investigator, Arbiter) to validate error hypotheses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.