MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
Quang H. Nguyen, Thinh Dao, Duy C. Hoang, Juliette Decugis, Saurav, Manchanda, Nitesh V. Chawla, Khoa D. Doan

TL;DR
MetaLLM is a dynamic framework that intelligently routes queries to the most suitable LLMs, optimizing for both accuracy and cost-effectiveness in classification and question-answering tasks.
Contribution
It introduces a multi-armed bandit based approach for real-time LLM selection, improving performance and reducing costs compared to static methods.
Findings
Significantly improved accuracy and cost savings in experiments
Effective LLM routing across multiple platforms and open-source models
Demonstrates practical viability in real-world scenarios
Abstract
The rapid progress in machine learning (ML) has brought forth many large language models (LLMs) that excel in various tasks and areas. These LLMs come with different abilities and costs in terms of computation or pricing. Since the demand for each query can vary, e.g., because of the queried domain or its complexity, defaulting to one LLM in an application is not usually the best choice, whether it is the biggest, priciest, or even the one with the best average test performance. Consequently, picking the right LLM that is both accurate and cost-effective for an application is necessary yet remains a challenge. In this paper, we introduce MetaLLM, a framework that dynamically and intelligently routes each query to the optimal LLM (among several available LLMs) for classification and multi-choice question-answering tasks, achieving significantly improved accuracy and cost-effectiveness.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Attention Dropout · Adam · Dropout · Weight Decay
