TL;DR
This study analyzes how hosted open-weight LLM APIs function as dynamic, provider-specific services rather than static models, revealing demand patterns, provider behavior, and task-dependent provider choices.
Contribution
It introduces a measurement methodology and empirical analysis showing that LLM APIs are heterogeneous, evolving services influenced by provider and task-specific factors.
Findings
Demand is concentrated but persistent across versions.
Provider listing breadth does not guarantee adoption.
Task type influences provider choice and performance.
Abstract
Open-weight large language models (LLMs) are usually named as model artifacts, but production users often consume them as hosted API services. This paper argues that the operational unit is a service object: a provider-specific, time-varying endpoint defined by model variant, protocol behavior, context capacity, listed price, latency and throughput distribution, reliability, and task feasibility. Using sampled request logs, provider metadata, compatibility probes, pricing snapshots, and continuous latency measurements collected by AI Ping during Q4 2025, we study how this service layer changes the meaning of "the same model." Three empirical patterns emerge. First, observed demand is concentrated but persistent across versions: in the displayed family aggregate, the largest family carries 32.0% of relative demand and the top five carry 87.4%, with a Gini coefficient of 0.693, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
