Towards Optimizing the Costs of LLM Usage

Shivanshu Shekhar; Tanishq Dubey; Koyel Mukherjee; Apoorv Saxena,; Atharv Tyagi; Nishanth Kotla

arXiv:2402.01742·cs.CL·February 6, 2024·5 cites

Towards Optimizing the Costs of LLM Usage

Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena,, Atharv Tyagi, Nishanth Kotla

PDF

Open Access

TL;DR

This paper presents a cost-optimization framework for LLM usage that predicts output quality without invocation, enabling cost-effective selection and token reduction strategies, validated on diverse datasets.

Contribution

It introduces a novel quality prediction model and optimization algorithms for LLM selection and token reduction, improving cost-efficiency while maintaining quality.

Findings

01

Cost reduction of 40%-90% achieved

02

Quality improvement of 4%-7% demonstrated

03

Effective on enterprise and open-source datasets

Abstract

Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases. In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Optimization Algorithms · Manufacturing Process and Optimization

MethodsAttentive Walk-Aggregating Graph Neural Network