TL;DR
This paper introduces FORC, a meta-modeling framework that intelligently assigns natural language prompts to different-sized language models to optimize cost and performance across various tasks.
Contribution
We propose a novel meta-modeling approach for cost-effective language model selection that adapts to input difficulty, reducing costs while maintaining high performance.
Findings
FORC achieves up to 63% cost reduction compared to using the largest LM.
It matches the performance of the largest LM across multiple datasets.
The framework is flexible and can be tuned for different cost-performance tradeoffs.
Abstract
Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted. LM performance has consistently been increasing with model size - but so has the monetary cost of querying the ever larger models. Importantly, however, not all inputs are equally hard: some require larger LMs for obtaining a satisfactory solution, whereas for others smaller LMs suffice. Based on this fact, we design a framework for cost-effective language model choice, called "Fly-swat or cannon" (FORC). Given a set of inputs and a set of candidate LMs, FORC judiciously assigns each input to an LM predicted to do well on the input according to a so-called meta-model, aiming to achieve high overall performance at low cost. The cost-performance tradeoff can be flexibly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
