DiSRouter: Distributed Self-Routing for LLM Selections

Hang Zheng; Hongshen Xu; Yongkai Lin; Shuai Fan; Lu Chen; Kai Yu

arXiv:2510.19208·cs.CL·March 3, 2026

DiSRouter: Distributed Self-Routing for LLM Selections

Hang Zheng, Hongshen Xu, Yongkai Lin, Shuai Fan, Lu Chen, Kai Yu

PDF

Open Access 3 Reviews

TL;DR

DiSRouter introduces a distributed, self-aware routing system for LLMs, enabling flexible, scalable, and effective query routing by leveraging each model's self-assessment capabilities, outperforming traditional centralized methods.

Contribution

This work proposes a novel distributed self-routing paradigm for LLMs, utilizing self-awareness training to improve query routing without relying on external routers.

Findings

01

Outperforms existing routing methods in utility.

02

Effectively distinguishes easy and hard queries.

03

Generalizes well to out-of-domain tasks.

Abstract

The proliferation of Large Language Models (LLMs) has created a diverse ecosystem of models with highly varying performance and costs, necessitating effective query routing to balance performance and expense. Current routing systems often rely on a centralized external router trained on a fixed set of LLMs, making them inflexible and prone to poor performance since the small router can not fully understand the knowledge boundaries of different LLMs. We introduce DiSRouter (Distributed Self-Router), a novel paradigm that shifts from centralized control to distributed routing. In DiSRouter, a query traverses a network of LLM agents, each independently deciding whether to answer or route to other agents based on its own self-awareness, its ability to judge its competence. This distributed design offers superior flexibility, scalability, and generalizability. To enable this, we propose a…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- **Originality**: The shift from centralized to distributed routing is novel, creatively integrating self-assessment concepts (e.g., LLM uncertainty) into a routing framework. The scenario-adaptive reward function is an innovative touch. - **Quality**: Experimental design is rigorous within its scope, with utility metrics and modularity tests. The pipeline is methodically described. - **Clarity**: The problem formulation in Section 2 is precise. Writing is concise. - **Significance**: The "plug

Weaknesses

- **Limited Generalizability**: Experiments use only the Qwen2.5-Instruct series, neglecting other LLM families (e.g. Llama). This casts doubt on DiSRouter's applicability to diverse models. The authors should include cross-architecture validation. - **Training Efficiency Concerns**: The Self-Awareness Training (SFT + RL) is computationally heavy, but the paper dismisses routing cost as "negligible" without quantifying training overhead. A cost-benefit analysis is missing. - **Limited Robustness

Reviewer 02Rating 4Confidence 2

Strengths

1.Experiments show that DiSRouter's utility score (performance-α·cost) is significantly superior to both baselines and existing router systems across various performance-cost scenarios, with both in-domain and out-of-domain data. It is more flexible and has "plug-and-play" modular capabilities. 2.The hypothesis that training does not inject new knowledge but only improves self-boundary awareness can be supported by experimental results: ∆Performance <1%, which verifies that the performance impr

Weaknesses

Although the experimental results have shown that the authors' method is far superior to the baseline and other routing methods, I believe that some additional perspectives are still beneficial for future work: 1.The cost introduced by the two-stage training approach may require further evaluation or comparison with other methods, as each agent needs to perform independent multiple-time inference to prepare its SFT data and conduct RL training. 2.For more open agent systems, data preparation i

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper introduces a "Distributed Self-Routing" framework, replacing the external centralized router with intrinsic "self-awareness." 2. It provides a concrete technical contribution with the two-stage "Self-Awareness Training" pipeline (SFT + RL), enabling models to internally assess their knowledge boundaries and execute a "reject" action. 3. The framework is inherently modular ("plug-and-play").

Weaknesses

1. The core utility metric (Utility = Performance - α * Cost) is flawed. It completely ignores the significant cumulative latency incurred by the cascade structure. A query rejected multiple times will have a real-world response time far exceeding that of a single-shot centralized router, a critical cost factor this paper overlooks. 2. The experiments are confined to a homogeneous model pool (all Qwen-family models). This is a major threat to validity. The paper provides no evidence that the "s

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks