Beyond Benchmarks: The Economics of AI Inference

Boqin Zhuang; Jiacheng Qiao; Mingqian Liu; Mingxing Yu; Ping Hong; Rui Li; Xiaoxia Song; Xiangjun Xu; Xu Chen; Yaoyao Ma; Yujie Gao

arXiv:2510.26136·cs.AI·October 31, 2025

Beyond Benchmarks: The Economics of AI Inference

Boqin Zhuang, Jiacheng Qiao, Mingqian Liu, Mingxing Yu, Ping Hong, Rui Li, Xiaoxia Song, Xiangjun Xu, Xu Chen, Yaoyao Ma, Yujie Gao

PDF

TL;DR

This paper develops an economic framework for understanding the costs and efficiencies of AI inference in large language models, providing insights into deployment and market optimization.

Contribution

It introduces the first empirical 'LLM Inference Production Frontier' and establishes economic principles guiding inference cost and quality trade-offs.

Findings

01

Diminishing marginal cost with increased inference scale

02

Diminishing returns to scale in LLM inference

03

Identification of an optimal cost-effectiveness zone

Abstract

The inference cost of Large Language Models (LLMs) has become a critical factor in determining their commercial viability and widespread adoption. This paper introduces a quantitative ``economics of inference'' framework, treating the LLM inference process as a compute-driven intelligent production activity. We analyze its marginal cost, economies of scale, and quality of output under various performance configurations. Based on empirical data from WiNEval-3.0, we construct the first ``LLM Inference Production Frontier,'' revealing three principles: diminishing marginal cost, diminishing returns to scale, and an optimal cost-effectiveness zone. This paper not only provides an economic basis for model deployment decisions but also lays an empirical foundation for the future market-based pricing and optimization of AI inference resources.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.