Locational Pricing for Generative-AI Services via Token-Flow Market Clearing
Shaohui Liu

TL;DR
This paper proposes a locational token-flow market model for efficiently dispatching generative AI workloads across geographically distributed infrastructure, optimizing costs and latency.
Contribution
It introduces a network-constrained token-flow market model with transfer-aware extensions, providing a novel approach to locational pricing for AI services.
Findings
Transfer-aware model raises operating costs by 2.7% in a 5-node case study.
Locational prices can increase by 117% when reducing chatbot latency from 100ms to 15ms.
The model's dispatch logic remains consistent at larger scales but becomes infeasible under demand exceeding capacity.
Abstract
GenAI services are in an early yet fast expanding phase. Providers compete on model capability and service quality, while the underlying infrastructure remains expensive and heterogeneous across regions, workloads, and compute assets. If these services diffuse into routine daily use, the relevant engineering problem becomes not only better models but also efficient dispatch on a geographically distributed AI service infrastructure. To address this, we formulate a network-constrained token-flow market that clears AI workloads across compute nodes and communication links. The baseline model is a linear program that co-optimizes routing and processing subject to compute-capacity and bandwidth constraints; its dual variables define location- and workload-specific marginal service prices. We further introduce a transfer-aware extension that prices data movement in physical units and isolates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
