Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

Yiwen Chen; Kuan Li; Fuzhen Zhuang; Deqing Wang; Zhao Zhang; Liwen Zhang; Yong Jiang; Shuai Wang; Minhao Cheng

arXiv:2605.10235·cs.CL·May 13, 2026

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

Yiwen Chen, Kuan Li, Fuzhen Zhuang, Deqing Wang, Zhao Zhang, Liwen Zhang, Yong Jiang, Shuai Wang, Minhao Cheng

PDF

TL;DR

The paper introduces Pre-Route, a proactive routing framework for LLMs that improves decision-making between RAG and long-context strategies, enhancing efficiency and interpretability in long-document reasoning.

Contribution

Pre-Route leverages latent routing abilities of LLMs with structured reasoning, enabling explainable, cost-effective task analysis and routing decisions, and transfers this to smaller models.

Findings

01

LLMs can reliably perform structured routing with guidelines.

02

Structured prompts improve separability of routing decisions in representation space.

03

Distilled reasoning structures enable lightweight deployment.

Abstract

Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in choosing between retrieval-augmented generation (RAG) and long-context (LC) strategies: RAG is efficient but constrained by retrieval quality, while LC supports global reasoning at higher cost and with position sensitivity. Existing methods such as Self-Route adopt failure-driven fallback from RAG to LC, but remain passive, inefficient, and hard to interpret. We propose Pre-Route, a proactive routing framework that performs structured reasoning before answering. Using lightweight metadata (e.g., document type, length, initial snippet), Pre-Route enables task analysis, coverage estimation, and information-need prediction, producing explainable and cost-efficient routing decisions. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.