Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

Yufeng Wang; Lu wei; Haibin Ling

arXiv:2511.09803·cs.CL·April 15, 2026

Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

Yufeng Wang, Lu wei, Haibin Ling

PDF

TL;DR

The paper introduces TARG, a training-free adaptive retrieval gating method for RAG that improves efficiency and accuracy by selectively retrieving based on a lightweight uncertainty score from the base model's draft.

Contribution

TARG is a novel, training-free, model-agnostic gating policy that reduces retrieval and latency in RAG without sacrificing performance, using only draft logits.

Findings

01

TARG reduces retrieval by 70-90% while maintaining or improving accuracy.

02

The margin signal from prefix logits is a robust default for modern instruction-tuned LLMs.

03

TARG remains close to Never-RAG in overhead, with significant efficiency gains.

Abstract

Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Gating (TARG), a single-shot policy that decides when to retrieve using only a short, no-context draft from the base model. From the draft's prefix logits, TARG computes lightweight uncertainty scores-mean token entropy, a margin signal derived from the top-1/top-2 logit gap via a monotone link, or small-N variance across a handful of stochastic prefixes-and triggers retrieval only when the score exceeds a threshold. The gate is model-agnostic, adds only tens to hundreds of draft tokens, and requires no additional training or auxiliary heads. On five QA benchmarks spanning short-answer (NQ-Open, TriviaQA, PopQA), multi-hop (MuSiQue), and long-form (ASQA) tasks, TARG consistently pushes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.