A Self-Healing Framework for Reliable LLM-Based Autonomous Agents
Cheonsu Jeong, Younggun Shin

TL;DR
This paper introduces a comprehensive self-healing framework for LLM-based autonomous agents, enhancing reliability through failure detection, assessment, and automated recovery in complex software systems.
Contribution
It presents an integrated reliability-aware self-healing framework with failure taxonomy, detection, and adaptive recovery strategies for LLM-based agents.
Findings
Significantly increases task success rates
Reduces failure propagation in multi-agent workflows
Enhances overall system robustness
Abstract
Autonomous agents based on Large Language Models (LLMs) are increasingly being utilized in complex software systems. However, reliability remains a significant challenge due to unpredictable failures such as hallucinations, execution errors, and inconsistent reasoning. This paper proposes a reliability-aware self-healing framework for LLM-based software agents. The framework integrates failure detection, reliability assessment, and automated recovery mechanisms. First, we define a taxonomy of failure types and introduce a quantitative reliability assessment model. Next, we propose a failure detection method that identifies abnormal agent behavior based on execution patterns and output consistency. Finally, we design a self-healing mechanism that dynamically recovers from failures through adaptive replanning and corrective prompting strategies. The proposed framework was implemented in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
