RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement
Will LeVine, Brendan Evers, Sam Saltwick, Abhay Venkatesh

TL;DR
RubricRefine is a training-free pre-execution method that improves tool-use agent reliability by verifying and repairing semantic contracts before execution, reducing errors and latency.
Contribution
It introduces a novel, training-free approach for semantic contract verification and repair that enhances tool-use agent reliability without additional training.
Findings
Achieves 0.86 accuracy on M3ToolEval without execution attempts
Up to 2.6x lower latency compared to prior baselines
Performance is consistent across multiple models and tasks
Abstract
Iterative self-refinement is a popular inference-time reliability technique, but its effectiveness in code-mode tool use depends heavily on the structure of the feedback signal: unstructured critique helps inconsistently across models, and even revision with real execution feedback improves only modestly ( vs. baseline). The dominant failures are inter-tool contract violations (wrong output shape, incorrect tool routing, broken argument provenance) that run to completion without raising errors, making runtime feedback insufficient. We introduce RubricRefine, a training-free method for pre-execution semantic contract verification that generates task- and registry-specific rubrics, scores candidate code against explicit contract checks, and iteratively repairs failures before any execution occurs. RubricRefine reaches , averaged across seven models, on M3ToolEval with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
