Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG
Yufeng Wang, Lu wei, Haibin Ling

TL;DR
The paper introduces TARG, a training-free adaptive retrieval gating method for RAG that improves efficiency and accuracy by selectively retrieving based on a lightweight uncertainty score from the base model's draft.
Contribution
TARG is a novel, training-free, model-agnostic gating policy that reduces retrieval and latency in RAG without sacrificing performance, using only draft logits.
Findings
TARG reduces retrieval by 70-90% while maintaining or improving accuracy.
The margin signal from prefix logits is a robust default for modern instruction-tuned LLMs.
TARG remains close to Never-RAG in overhead, with significant efficiency gains.
Abstract
Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Gating (TARG), a single-shot policy that decides when to retrieve using only a short, no-context draft from the base model. From the draft's prefix logits, TARG computes lightweight uncertainty scores-mean token entropy, a margin signal derived from the top-1/top-2 logit gap via a monotone link, or small-N variance across a handful of stochastic prefixes-and triggers retrieval only when the score exceeds a threshold. The gate is model-agnostic, adds only tens to hundreds of draft tokens, and requires no additional training or auxiliary heads. On five QA benchmarks spanning short-answer (NQ-Open, TriviaQA, PopQA), multi-hop (MuSiQue), and long-form (ASQA) tasks, TARG consistently pushes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
