R2V Agent: Teaching SLMs When to Ask for Help
Raghu Vamshi Hemadri, Humaira Firdowse Mohammed, Rishabh Maheshwary, Srivatsava Daruru, Sagar Davasam, Vikas Yadav, Srinivas Sunkara, Sai Rajeswar

TL;DR
R2V-Agent is a risk-aware routing framework for interactive language models that selectively escalates to larger models only when local models are likely to fail, improving efficiency and reliability.
Contribution
The paper introduces a novel step-level router with residual failure risk estimation, enhancing decision-making in LLM cascades for dynamic task difficulty.
Findings
Achieves 94.3% success on HumanEval+ with only 0.60% escalations.
Recovers TextWorld success from 64.6% to 98.2% with 41.7% escalations.
Reaches 93.3% success on TerminalBench at 33.9% LLM calls.
Abstract
Efficient agentic systems should incur expensive frontier-model costs only on decisions where a cheaper local model is likely to fail. Existing LLM cascades usually route whole queries before execution, but task difficulty shifts mid-trajectory - after flaky tool calls, truncated observations, or compounding local errors - making pre-execution routing brittle. We introduce \textbf{R2V-Agent}, a risk-calibrated SLM-LLM routing framework for interactive agents. R2V combines four components: a distilled small language model (SLM) policy, a stronger teacher LLM, a lightweight process verifier that scores candidate actions at each step, and a calibrated step-level router. The router is our central contribution: after the SLM is trained, it estimates residual failure risk at each step and escalates only when teacher intervention is warranted. To make the routing problem well-defined, we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
