LLM Router: Rethinking Routing with Prefill Activations

Tanay Varshney; Annie Surla; Michelle Xu; Gomathy Venkata Krishnan; Maximilian Jeblick; David Austin; Neal Vaidya; Davide Onofrio

arXiv:2603.20895·cs.CL·April 2, 2026

LLM Router: Rethinking Routing with Prefill Activations

Tanay Varshney, Annie Surla, Michelle Xu, Gomathy Venkata Krishnan, Maximilian Jeblick, David Austin, Neal Vaidya, Davide Onofrio

PDF

TL;DR

This paper introduces a novel routing method for large language models using internal prefill activations, outperforming semantic-based routers and saving computational costs.

Contribution

It proposes Encoder-Target Decoupling and a SharedTrunkNet model to predict model correctness from prefill activations, enabling effective model selection.

Findings

01

SharedTrunkNet outperforms semantic baselines in model correctness prediction.

02

The method closes 45.58% of the performance gap between the best model and an oracle.

03

Achieves 74.31% cost savings compared to the most expensive model.

Abstract

LLMs often achieve similar average benchmark accuracies while exhibiting complementary strengths on different subsets of queries, suggesting that a router with query-specific model selection can outperform any single model. While existing routers rely on semantic query features, they often fail to capture model-specific failures or intrinsic task difficulty. We instead study routing via internal prefill activations. Our key idea, Encoder-Target Decoupling, separates the model that produces the predictive signal (the Encoder) from the model whose correctness is being estimated (the Target), allowing open-weight encoders to predict the performance of closed-source target models. We evaluate layerwise geometric probes, finding that Fisher Separability (J) effectively identifies informative layers, supported by Effective Dimensionality (d_eff) diagnostics. We then utilize a SharedTrunkNet,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.