PickLLM: Context-Aware RL-Assisted Large Language Model Routing
Dimitrios Sikeridis, Dennis Ramdass, Pranay Pareek

TL;DR
PickLLM is a reinforcement learning-based framework that dynamically routes queries to the most suitable large language model, optimizing for cost, latency, and accuracy in real-time.
Contribution
It introduces a novel RL-based approach for LLM routing that considers multiple customizable objectives and converges efficiently to optimal model selection.
Findings
Reduces query cost and latency effectively.
Converges quickly to optimal LLM choices.
Improves response quality based on customizable scoring.
Abstract
Recently, the number of off-the-shelf Large Language Models (LLMs) has exploded with many open-source options. This creates a diverse landscape regarding both serving options (e.g., inference on local hardware vs remote LLM APIs) and model heterogeneous expertise. However, it is hard for the user to efficiently optimize considering operational cost (pricing structures, expensive LLMs-as-a-service for large querying volumes), efficiency, or even per-case specific measures such as response accuracy, bias, or toxicity. Also, existing LLM routing solutions focus mainly on cost reduction, with response accuracy optimizations relying on non-generalizable supervised training, and ensemble approaches necessitating output computation for every considered LLM candidate. In this work, we tackle the challenge of selecting the optimal LLM from a model pool for specific queries with customizable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training · Q-Learning · Focus
