FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Lingjiao Chen, Matei Zaharia, James Zou

TL;DR
This paper introduces FrugalGPT, a cost-effective LLM cascade method that reduces expenses and enhances accuracy by intelligently selecting LLM combinations, promoting sustainable and efficient use of large language models.
Contribution
It proposes FrugalGPT, a novel LLM cascade approach that learns optimal model combinations to cut costs and boost performance, with experimental validation showing significant improvements.
Findings
FrugalGPT achieves up to 98% cost reduction compared to GPT-4.
It can outperform GPT-4 in accuracy by 4% at the same cost.
The approach enables sustainable and efficient use of LLMs.
Abstract
There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques · Privacy-Preserving Technologies in Data
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Dense Connections · Residual Connection · Adam
