FrugalGPT: How to Use Large Language Models While Reducing Cost and   Improving Performance

Lingjiao Chen; Matei Zaharia; James Zou

arXiv:2305.05176·cs.LG·May 10, 2023·50 cites

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, James Zou

PDF

Open Access 1 Repo

TL;DR

This paper introduces FrugalGPT, a cost-effective LLM cascade method that reduces expenses and enhances accuracy by intelligently selecting LLM combinations, promoting sustainable and efficient use of large language models.

Contribution

It proposes FrugalGPT, a novel LLM cascade approach that learns optimal model combinations to cut costs and boost performance, with experimental validation showing significant improvements.

Findings

01

FrugalGPT achieves up to 98% cost reduction compared to GPT-4.

02

It can outperform GPT-4 in accuracy by 4% at the same cost.

03

The approach enables sustainable and efficient use of LLMs.

Abstract

There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanford-futuredata/frugalgpt
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Privacy-Preserving Technologies in Data

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Dense Connections · Residual Connection · Adam