GraphRouter: A Graph-based Router for LLM Selections
Tao Feng, Yanzhen Shen, Jiaxuan You

TL;DR
GraphRouter introduces a graph-based framework that leverages contextual interactions among tasks, queries, and LLMs to improve selection efficiency, generalization, and performance across diverse scenarios without retraining.
Contribution
It proposes a novel inductive graph framework for LLM selection that captures contextual information and predicts edge attributes, enabling adaptive and generalizable LLM recommendations.
Findings
Surpasses existing routers with at least 12.3% performance improvement
Enhances generalization to new LLMs with at least 9.5% effect boost
Reduces computational demands significantly
Abstract
The rapidly growing number and variety of Large Language Models (LLMs) present significant challenges in efficiently selecting the appropriate LLM for a given query, especially considering the trade-offs between performance and computational cost. Current LLM selection methods often struggle to generalize across new LLMs and different tasks because of their limited ability to leverage contextual interactions among tasks, queries, and LLMs, as well as their dependence on a transductive learning framework. To address these shortcomings, we introduce a novel inductive graph framework, named as GraphRouter, which fully utilizes the contextual information among tasks, queries, and LLMs to enhance the LLM selection process. GraphRouter constructs a heterogeneous graph comprising task, query, and LLM nodes, with interactions represented as edges, which efficiently captures the contextual…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper studies the challenging LLM model selection problem, which has been well addressed. 2. This paper considers leveraging graph learning to incorporate more contextual information for LLM model selection. 3. The experiments show that the proposed method achieves good performance compared to baselines.
1. The studied setting is not quite realistic. The proposed method constructs a graph with task, query, and LLM nodes. For each query, it may select a different LLM to answer the query, which is quite unrealistic in practice. 2. The model performance especially cost may vary a lot on different hardware settings. It is also unrealistic to make sure the real hardware used can align with the numbers in the training data. And we cannot curate the training data for different hardware settings. 3. S
The paper seems to be the first to reformulate LLM selection as a graph-based edge prediction problem, providing a fresh perspective on router design. The heterogeneous graph structure effectively captures the complex relationships between tasks, queries, and LLMs. The framework addresses real-world challenges in LLM deployment, particularly the ability to handle new LLMs without retraining and balance performance with computational costs. The evaluation across three different cost-performance s
1. Authors only provide intuitive explanations for why graph structure should help with LLM selection, lacking analysis on why edge prediction correlates with routing performance. Also, the paper fails to explaine why the heterogeneous graph structure (Figure 5) is optimal for capturing LLM-query relationships. 2. In L. 219, task-query edges are initialized uniformly to 1, which seems overly simplistic given the rich task-query relationships that could be captured. LLM-query edge features only u
*S1* This paper is well-motivated and addresses an important problem. The rapid evolution of LLMs necessitates effective LLM selection, yet existing methods often over-simplify this task as failing to consider the contextual information between tasks, queries, and LLMs, and they struggle to accommodate newly emerged LLMs. The graph-based LLM selection method not only offers a new and effective approach to the LLM selection task but also contributes a novel application in LLM + GNN research. *S
*W1* The evaluation setting is simplified. On one hand, the experimental tasks primarily focus on reasoning and question answering, overlooking specialized areas such as code generation. On the other hand, the available LLMs are quite limited, with nearly half originating from the same series (i.e., LLaMA). *W2* Current methodology overlooks the inherent relationships among LLMs. For instance, some LLMs belong to the same series (e.g., LLaMA3-7B, LLaMA3-8B). Adding links to indicate such paren
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Rights Management and Security · Semantic Web and Ontologies
