Repairing Tool Calls Using Post-tool Execution Reflection and RAG

Jason Tsay; Zidane Wright; Gaodan Fang; Kiran Kate; Saurabh Jha; Yara Rizk

arXiv:2510.17874·cs.SE·October 22, 2025

Repairing Tool Calls Using Post-tool Execution Reflection and RAG

Jason Tsay, Zidane Wright, Gaodan Fang, Kiran Kate, Saurabh Jha, Yara Rizk

PDF

Open Access

TL;DR

This paper introduces a reflection-based approach combining LLMs and RAG to automatically repair tool calls in agentic systems, significantly improving success rates and query accuracy, especially with troubleshooting documents.

Contribution

We develop a novel post-tool execution reflection method that leverages LLMs and RAG to repair and improve tool calls in agentic systems, focusing on kubectl commands.

Findings

01

55% pass rate improvement for tool call success

02

36% increase in correct query answering

03

Troubleshooting docs outperform official documentation by 10%

Abstract

Agentic systems interact with external systems by calling tools such as Python functions, REST API endpoints, or command line tools such as kubectl in Kubernetes. These tool calls often fail for various syntactic and semantic reasons. Some less obvious semantic errors can only be identified and resolved after analyzing the tool's response. To repair these errors, we develop a post-tool execution reflection component that combines large language model (LLM)-based reflection with domain-specific retrieval-augmented generation (RAG) using documents describing both the specific tool being called and troubleshooting documents related to the tool. For this paper, we focus on the use case of the kubectl command line tool to manage Kubernetes, a platform for orchestrating cluster applications. Through a larger empirical study and a smaller manual evaluation, we find that our RAG-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Topic Modeling