HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning
Nikunj Gupta, Bill Guo, Rajgopal Kannan, Viktor K. Prasanna

TL;DR
HierRouter is a reinforcement learning-based hierarchical routing system that dynamically assembles specialized lightweight language models for efficient, high-quality inference across diverse tasks, reducing costs while maintaining performance.
Contribution
It introduces a novel reinforcement learning framework for hierarchical routing of multiple LLMs, optimizing inference pipelines for resource efficiency and task performance.
Findings
Up to 2.4x improvement in response quality over individual models
Minimal additional inference cost incurred
Effective across multiple benchmarks and tasks
Abstract
Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs, limiting their deployment in resource-constrained or real-time settings. To address this, we propose HierRouter, a hierarchical routing approach that dynamically assembles inference pipelines from a pool of specialized, lightweight language models. Formulated as a finite-horizon Markov Decision Process (MDP), our approach trains a Proximal Policy Optimization (PPO)-based reinforcement learning agent to iteratively select which models to invoke at each stage of multi-hop inference. The agent conditions on the evolving context and accumulated cost to make context-aware routing decisions. Experiments with three open-source candidate LLMs across six benchmarks, including QA, code generation, and mathematical reasoning, show that HierRouter improves response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Neural Network Applications
