AgentFixer: From Failure Detection to Fix Recommendations in LLM Agentic Systems

Hadar Mulian; Sergey Zeltyn; Ido Levy; Liane Galanti; Avi Yaeli; Segev Shlomov

arXiv:2603.29848·cs.AI·April 1, 2026

AgentFixer: From Failure Detection to Fix Recommendations in LLM Agentic Systems

Hadar Mulian, Sergey Zeltyn, Ido Levy, Liane Galanti, Avi Yaeli, Segev Shlomov

PDF

TL;DR

This paper presents a validation framework for LLM-based agentic systems that diagnoses failures, suggests fixes, and enhances reliability through systematic tools and analysis, improving performance and robustness.

Contribution

It introduces a comprehensive validation framework with failure detection, root-cause analysis, and self-reflection capabilities, advancing systematic diagnosis and improvement in LLM agentic systems.

Findings

01

Recurrent planner misalignments and schema violations identified.

02

Refined prompting strategies improved accuracy of mid-sized models.

03

Interactive self-reflection enabled actionable insights and focused improvements.

Abstract

We introduce a comprehensive validation framework for LLM-based agentic systems that provides systematic diagnosis and improvement of reliability failures. The framework includes fifteen failure-detection tools and two root-cause analysis modules that jointly uncover weaknesses across input handling, prompt design, and output generation. It integrates lightweight rule-based checks with LLM-as-a-judge assessments to support structured incident detection, classification, and repair. We applied the framework to IBM CUGA, evaluating its performance on the AppWorld and WebArena benchmarks. The analysis revealed recurrent planner misalignments, schema violations, brittle prompt dependencies, and more. Based on these insights, we refined both prompting and coding strategies, maintaining CUGA's benchmark results while enabling mid-sized models such as Llama 4 and Mistral Medium to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.