SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models
Yuxuan Jiang, Francis Ferraro

TL;DR
SCRIBE introduces a structured reward modeling framework that improves tool-using language models by grounding rewards in skill prototypes, leading to state-of-the-art results in reasoning and tool interaction benchmarks.
Contribution
It presents a novel mid-level abstraction for reward modeling using skill prototypes, reducing reward variance and enhancing multi-step reasoning in language models.
Findings
Achieved 63.3% accuracy on AIME25 with Qwen3-4B.
Significantly increased success in multi-turn tool interactions.
Demonstrated additive benefits to low-level tool optimizations.
Abstract
Training reliable tool-augmented agents remains a significant challenge, largely due to the difficulty of credit assignment in multi-step reasoning. While process-level reward models offer a promising direction, existing LLM-based judges often produce noisy and inconsistent signals because they lack fine-grained, task-specific rubrics to distinguish high-level planning from low-level execution. In this work, we introduce SCRIBE (Skill-Conditioned Reward with Intermediate Behavioral Evaluation), a reinforcement learning framework that intervenes at a novel mid-level abstraction. SCRIBE grounds reward modeling in a curated library of skill prototypes, transforming open-ended LLM evaluation into a constrained verification problem. By routing each subgoal to a corresponding prototype, the reward model is equipped with precise, structured rubrics that substantially reduce reward variance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
