R2V Agent: Teaching SLMs When to Ask for Help

Raghu Vamshi Hemadri; Humaira Firdowse Mohammed; Rishabh Maheshwary; Srivatsava Daruru; Sagar Davasam; Vikas Yadav; Srinivas Sunkara; Sai Rajeswar

arXiv:2605.16604·cs.LG·May 19, 2026

R2V Agent: Teaching SLMs When to Ask for Help

Raghu Vamshi Hemadri, Humaira Firdowse Mohammed, Rishabh Maheshwary, Srivatsava Daruru, Sagar Davasam, Vikas Yadav, Srinivas Sunkara, Sai Rajeswar

PDF

TL;DR

R2V-Agent is a risk-aware routing framework for interactive language models that selectively escalates to larger models only when local models are likely to fail, improving efficiency and reliability.

Contribution

The paper introduces a novel step-level router with residual failure risk estimation, enhancing decision-making in LLM cascades for dynamic task difficulty.

Findings

01

Achieves 94.3% success on HumanEval+ with only 0.60% escalations.

02

Recovers TextWorld success from 64.6% to 98.2% with 41.7% escalations.

03

Reaches 93.3% success on TerminalBench at 33.9% LLM calls.

Abstract

Efficient agentic systems should incur expensive frontier-model costs only on decisions where a cheaper local model is likely to fail. Existing LLM cascades usually route whole queries before execution, but task difficulty shifts mid-trajectory - after flaky tool calls, truncated observations, or compounding local errors - making pre-execution routing brittle. We introduce \textbf{R2V-Agent}, a risk-calibrated SLM-LLM routing framework for interactive agents. R2V combines four components: a distilled small language model (SLM) policy, a stronger teacher LLM, a lightweight process verifier that scores candidate actions at each step, and a calibrated step-level router. The router is our central contribution: after the SLM is trained, it estimates residual failure risk at each step and escalates only when teacher intervention is warranted. To make the routing problem well-defined, we first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.