Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents
Anh Ta, Junjie Zhu, Shahin Shayandeh

TL;DR
This paper introduces a real-time inference-time feedback mechanism for tool-using agents, employing a secondary reviewer agent to evaluate and correct tool calls before execution, thereby improving accuracy and robustness.
Contribution
It proposes a novel multi-agent architecture with a dedicated review agent for proactive error mitigation during inference, moving beyond traditional post-hoc evaluation methods.
Findings
Achieved +5.5% on irrelevance detection
Achieved +7.1% on multi-turn tasks
Reviewer model choice significantly impacts performance
Abstract
Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, and fundamentally cannot course-correct the agent in real time. To close this gap, we move evaluation into the execution loop at inference time: a specialized reviewer agent evaluates provisional tool calls prior to execution, shifting the paradigm from post-hoc recovery to proactive evaluation and error mitigation. In practice, this architecture establishes a clear separation of concerns between the primary execution agent and a secondary review agent. As with any multi-agent system, the reviewer can introduce new errors while correcting others, yet no prior work to our knowledge has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
