The xPU-athalon: Quantifying the Competition of AI Acceleration

Alicia Golden; Carole-Jean Wu; Gu-Yeon Wei; David Brooks

arXiv:2604.10852·cs.AR·April 14, 2026

The xPU-athalon: Quantifying the Competition of AI Acceleration

Alicia Golden, Carole-Jean Wu, Gu-Yeon Wei, David Brooks

PDF

TL;DR

This paper provides a comprehensive quantitative comparison of various AI accelerators and GPUs, analyzing performance, power, energy efficiency, and programmability across different workloads and configurations.

Contribution

It offers the first detailed benchmarking and analysis of emerging AI accelerators like Cerebras, SambaNova, and Gaudi against traditional GPUs, highlighting their trade-offs and optimization space.

Findings

01

Optimal hardware varies with workload parameters such as batch size and model size.

02

Cerebras, SambaNova, and Gaudi have significantly higher idle power than GPUs.

03

Power consumption and energy costs are heavily influenced by communication and utilization levels.

Abstract

The push for greater efficiency in AI computation has given rise to an array of accelerator architectures that increasingly challenge the GPU's long-standing dominance. In this work, we provide a quantitative view of this evolving landscape of AI accelerators, including the Cerebras CS-3, SambaNova SN-40, Groq, Gaudi, and TPUv5e platforms, and compare against both NVIDIA (A100, H100) and AMD (MI-300X) GPUs. We evaluate key trade-offs in latency, throughput, power consumption, and energy-efficiency across both (i) end-to-end workloads and (ii) benchmarks of individual computational primitives. Notably, we find the optimal hardware platform varies across batch size, sequence length, and model size, revealing a large underlying optimization space. Our analysis includes detailed power measurements across the prefill and decode phases of LLM inference, as well as quantification of the energy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.