LLM Router: Rethinking Routing with Prefill Activations
Tanay Varshney, Annie Surla, Michelle Xu, Gomathy Venkata Krishnan, Maximilian Jeblick, David Austin, Neal Vaidya, Davide Onofrio

TL;DR
This paper introduces a novel routing method for large language models using internal prefill activations, outperforming semantic-based routers and saving computational costs.
Contribution
It proposes Encoder-Target Decoupling and a SharedTrunkNet model to predict model correctness from prefill activations, enabling effective model selection.
Findings
SharedTrunkNet outperforms semantic baselines in model correctness prediction.
The method closes 45.58% of the performance gap between the best model and an oracle.
Achieves 74.31% cost savings compared to the most expensive model.
Abstract
LLMs often achieve similar average benchmark accuracies while exhibiting complementary strengths on different subsets of queries, suggesting that a router with query-specific model selection can outperform any single model. While existing routers rely on semantic query features, they often fail to capture model-specific failures or intrinsic task difficulty. We instead study routing via internal prefill activations. Our key idea, Encoder-Target Decoupling, separates the model that produces the predictive signal (the Encoder) from the model whose correctness is being estimated (the Target), allowing open-weight encoders to predict the performance of closed-source target models. We evaluate layerwise geometric probes, finding that Fisher Separability (J) effectively identifies informative layers, supported by Effective Dimensionality (d_eff) diagnostics. We then utilize a SharedTrunkNet,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
