SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context

Hairu Wang; Yuan Feng; Yukun Cao; Xike Xie; S Kevin Zhou

arXiv:2505.23841·cs.IR·October 14, 2025

SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context

Hairu Wang, Yuan Feng, Yukun Cao, Xike Xie, S Kevin Zhou

PDF

Open Access 1 Video

TL;DR

SkewRoute is a training-free routing method for knowledge graph retrieval-augmented generation that uses score skewness to efficiently direct queries to appropriate-sized LLMs, significantly reducing inference costs.

Contribution

It introduces the first dedicated, training-free routing framework for KG-RAG based on score skewness, achieving over 3x higher effectiveness and minimal runtime overhead.

Findings

01

Over 3x higher routing effectiveness

02

Reduces runtime to less than 0.001x of existing methods

03

Effective in balancing performance and cost in KG-RAG

Abstract

Large language models excel at many tasks but often incur high inference costs during deployment. To mitigate hallucination, many systems use a knowledge graph to enhance retrieval-augmented generation (KG-RAG). However, the large amount of retrieved knowledge contexts increase these inference costs further. A promising solution to balance performance and cost is LLM routing, which directs simple queries to smaller LLMs and complex ones to larger LLMs. However, no dedicated routing methods currently exist for RAG, and existing training-based routers face challenges scaling to this domain due to the need for extensive training data. We observe that the score distributions produced by the retrieval scorer strongly correlate with query difficulty. Based on this, we propose an extremely simple yet effective routing framework, the first specifically designed for KG-RAG that efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context· underline

Taxonomy

TopicsAdvanced Graph Neural Networks · Brain Tumor Detection and Classification · Neural Networks and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Byte Pair Encoding · Attention Dropout · Softmax · WordPiece · BART · Weight Decay · Multi-Head Attention · Attention Is All You Need