MetaLLM: A High-performant and Cost-efficient Dynamic Framework for   Wrapping LLMs

Quang H. Nguyen; Thinh Dao; Duy C. Hoang; Juliette Decugis; Saurav; Manchanda; Nitesh V. Chawla; Khoa D. Doan

arXiv:2407.10834·cs.LG·April 23, 2025·2 cites

MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs

Quang H. Nguyen, Thinh Dao, Duy C. Hoang, Juliette Decugis, Saurav, Manchanda, Nitesh V. Chawla, Khoa D. Doan

PDF

Open Access 1 Repo

TL;DR

MetaLLM is a dynamic framework that intelligently routes queries to the most suitable LLMs, optimizing for both accuracy and cost-effectiveness in classification and question-answering tasks.

Contribution

It introduces a multi-armed bandit based approach for real-time LLM selection, improving performance and reducing costs compared to static methods.

Findings

01

Significantly improved accuracy and cost savings in experiments

02

Effective LLM routing across multiple platforms and open-source models

03

Demonstrates practical viability in real-world scenarios

Abstract

The rapid progress in machine learning (ML) has brought forth many large language models (LLMs) that excel in various tasks and areas. These LLMs come with different abilities and costs in terms of computation or pricing. Since the demand for each query can vary, e.g., because of the queried domain or its complexity, defaulting to one LLM in an application is not usually the best choice, whether it is the biggest, priciest, or even the one with the best average test performance. Consequently, picking the right LLM that is both accurate and cost-effective for an application is necessary yet remains a challenge. In this paper, we introduce MetaLLM, a framework that dynamically and intelligently routes each query to the optimal LLM (among several available LLMs) for classification and multi-choice question-answering tasks, achieving significantly improved accuracy and cost-effectiveness.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mail-research/metallm-wrapper
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Attention Dropout · Adam · Dropout · Weight Decay