Repairing Tool Calls Using Post-tool Execution Reflection and RAG
Jason Tsay, Zidane Wright, Gaodan Fang, Kiran Kate, Saurabh Jha, Yara Rizk

TL;DR
This paper introduces a reflection-based approach combining LLMs and RAG to automatically repair tool calls in agentic systems, significantly improving success rates and query accuracy, especially with troubleshooting documents.
Contribution
We develop a novel post-tool execution reflection method that leverages LLMs and RAG to repair and improve tool calls in agentic systems, focusing on kubectl commands.
Findings
55% pass rate improvement for tool call success
36% increase in correct query answering
Troubleshooting docs outperform official documentation by 10%
Abstract
Agentic systems interact with external systems by calling tools such as Python functions, REST API endpoints, or command line tools such as kubectl in Kubernetes. These tool calls often fail for various syntactic and semantic reasons. Some less obvious semantic errors can only be identified and resolved after analyzing the tool's response. To repair these errors, we develop a post-tool execution reflection component that combines large language model (LLM)-based reflection with domain-specific retrieval-augmented generation (RAG) using documents describing both the specific tool being called and troubleshooting documents related to the tool. For this paper, we focus on the use case of the kubectl command line tool to manage Kubernetes, a platform for orchestrating cluster applications. Through a larger empirical study and a smaller manual evaluation, we find that our RAG-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Topic Modeling
