Dynamic LLM Routing and Selection based on User Preferences: Balancing Performance, Cost, and Ethics
Deepak Babu Piskala, Vijay Raajaa, Sachin Mishra, Bruno Bozza

TL;DR
This paper presents OptiRoute, a dynamic routing system that intelligently selects the most suitable large language model for specific tasks by balancing performance, cost, and ethical considerations, using lightweight analysis and hybrid filtering.
Contribution
The paper introduces OptiRoute, a novel model routing engine that efficiently matches tasks to optimal LLMs based on detailed user preferences and criteria.
Findings
OptiRoute effectively balances cost, performance, and ethics in model selection.
The hybrid kNN and hierarchical filtering approach reduces computational overhead.
Real-time routing demonstrates improved efficiency in cloud and regulated environments.
Abstract
With the widespread deployment of large language models (LLMs) such as GPT4, BART, and LLaMA, the need for a system that can intelligently select the most suitable model for specific tasks while balancing cost, latency, accuracy, and ethical considerations has become increasingly important. Recognizing that not all tasks necessitate models with over 100 billion parameters, we introduce OptiRoute, an advanced model routing engine designed to dynamically select and route tasks to the optimal LLM based on detailed user-defined requirements. OptiRoute captures both functional (e.g., accuracy, speed, cost) and non-functional (e.g., helpfulness, harmlessness, honesty) criteria, leveraging lightweight task analysis and complexity estimation to efficiently match tasks with the best-fit models from a diverse array of LLMs. By employing a hybrid approach combining k-nearest neighbors (kNN) search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Adam · Softmax
