Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

Yuxuan Gao; Megan Wang; Yi Ling Yu

arXiv:2605.00300·cs.AI·May 4, 2026

Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference

Yuxuan Gao, Megan Wang, Yi Ling Yu

PDF

TL;DR

TokenArena introduces a comprehensive, endpoint-level benchmark for AI inference, measuring multiple performance and cost metrics across diverse models and deployment scenarios.

Contribution

It provides a novel empirical and methodological framework for evaluating AI inference endpoints with continuous, multi-dimensional metrics and a publicly available leaderboard.

Findings

01

Significant variation in model accuracy and energy use across different endpoints.

02

Workload-aware pricing significantly alters endpoint rankings.

03

The framework and leaderboard are publicly released for replication.

Abstract

Public inference benchmarks compare AI systems at the model and provider level, but the unit at which deployment decisions are actually made is the endpoint: the (provider, model, stock-keeping-unit) tuple at which a specific quantization, decoding strategy, region, and serving stack is exposed. We introduce TokenArena, a continuous benchmark that measures inference at endpoint granularity along five core axes (output speed, time to first token, workload-blended price, effective context, and quality on the live endpoint) and synthesizes them, together with a modeled energy estimate, into three headline composites: joules per correct answer, dollars per correct answer, and endpoint fidelity (output-distribution similarity to a first-party reference). The framework's novelty is empirical and methodological. Across 78 endpoints serving 12 model families, the same model on different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.