Dynamic LLM Routing and Selection based on User Preferences: Balancing   Performance, Cost, and Ethics

Deepak Babu Piskala; Vijay Raajaa; Sachin Mishra; Bruno Bozza

arXiv:2502.16696·cs.LG·February 25, 2025

Dynamic LLM Routing and Selection based on User Preferences: Balancing Performance, Cost, and Ethics

Deepak Babu Piskala, Vijay Raajaa, Sachin Mishra, Bruno Bozza

PDF

TL;DR

This paper presents OptiRoute, a dynamic routing system that intelligently selects the most suitable large language model for specific tasks by balancing performance, cost, and ethical considerations, using lightweight analysis and hybrid filtering.

Contribution

The paper introduces OptiRoute, a novel model routing engine that efficiently matches tasks to optimal LLMs based on detailed user preferences and criteria.

Findings

01

OptiRoute effectively balances cost, performance, and ethics in model selection.

02

The hybrid kNN and hierarchical filtering approach reduces computational overhead.

03

Real-time routing demonstrates improved efficiency in cloud and regulated environments.

Abstract

With the widespread deployment of large language models (LLMs) such as GPT4, BART, and LLaMA, the need for a system that can intelligently select the most suitable model for specific tasks while balancing cost, latency, accuracy, and ethical considerations has become increasingly important. Recognizing that not all tasks necessitate models with over 100 billion parameters, we introduce OptiRoute, an advanced model routing engine designed to dynamically select and route tasks to the optimal LLM based on detailed user-defined requirements. OptiRoute captures both functional (e.g., accuracy, speed, cost) and non-functional (e.g., helpfulness, harmlessness, honesty) criteria, leveraging lightweight task analysis and complexity estimation to efficiently match tasks with the best-fit models from a diverse array of LLMs. By employing a hybrid approach combining k-nearest neighbors (kNN) search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Adam · Softmax