A Theory of Training Profit-Optimal LLMs
Sophie Hao, William Merrill

TL;DR
This paper develops an economic model to determine the profit-maximizing scale of training large language models, considering costs, hardware efficiency, and data constraints.
Contribution
It introduces a novel economic framework combining scaling laws with microeconomic theory to analyze profit optimization in LLM training.
Findings
Optimal model size scales with hardware efficiency in the compute-bound regime.
Training expenditure scales as data squared divided by hardware efficiency.
Current industry trends align with profit-maximizing strategies in the compute-bound regime.
Abstract
Scaling LLMs requires tremendous computational resources, and recent advances in AI have gone hand in hand with massive amounts of capital expenditure. While it is established that scaling up LLMs reliably increases model quality (quantified in terms of loss or downstream evaluations), it is unclear how these quality improvements translate to potential revenue, and whether revenue increases would offset costs of larger-scale training and inference. In this work, we develop an economic model for characterizing the rational behavior of an LLM training firm by combining scaling laws with microeconomic theory. Under our model of firm behavior, LLM quality can be increased with more parameters and training tokens, leading to more potential adoption by consumers, who each have a quality threshold for using the LLM. On the other hand, additional parameters and training tokens both incur…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
