RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

Will LeVine; Brendan Evers; Sam Saltwick; Abhay Venkatesh

arXiv:2605.09730·cs.LG·May 19, 2026

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

Will LeVine, Brendan Evers, Sam Saltwick, Abhay Venkatesh

PDF

TL;DR

RubricRefine is a training-free pre-execution method that improves tool-use agent reliability by verifying and repairing semantic contracts before execution, reducing errors and latency.

Contribution

It introduces a novel, training-free approach for semantic contract verification and repair that enhances tool-use agent reliability without additional training.

Findings

01

Achieves 0.86 accuracy on M3ToolEval without execution attempts

02

Up to 2.6x lower latency compared to prior baselines

03

Performance is consistent across multiple models and tasks

Abstract

Iterative self-refinement is a popular inference-time reliability technique, but its effectiveness in code-mode tool use depends heavily on the structure of the feedback signal: unstructured critique helps inconsistently across models, and even revision with real execution feedback improves only modestly ( $0.75$ vs. $0.65$ baseline). The dominant failures are inter-tool contract violations (wrong output shape, incorrect tool routing, broken argument provenance) that run to completion without raising errors, making runtime feedback insufficient. We introduce RubricRefine, a training-free method for pre-execution semantic contract verification that generates task- and registry-specific rubrics, scores candidate code against explicit contract checks, and iteratively repairs failures before any execution occurs. RubricRefine reaches $0.86$ , averaged across seven models, on M3ToolEval with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.