The xPU-athalon: Quantifying the Competition of AI Acceleration
Alicia Golden, Carole-Jean Wu, Gu-Yeon Wei, David Brooks

TL;DR
This paper provides a comprehensive quantitative comparison of various AI accelerators and GPUs, analyzing performance, power, energy efficiency, and programmability across different workloads and configurations.
Contribution
It offers the first detailed benchmarking and analysis of emerging AI accelerators like Cerebras, SambaNova, and Gaudi against traditional GPUs, highlighting their trade-offs and optimization space.
Findings
Optimal hardware varies with workload parameters such as batch size and model size.
Cerebras, SambaNova, and Gaudi have significantly higher idle power than GPUs.
Power consumption and energy costs are heavily influenced by communication and utilization levels.
Abstract
The push for greater efficiency in AI computation has given rise to an array of accelerator architectures that increasingly challenge the GPU's long-standing dominance. In this work, we provide a quantitative view of this evolving landscape of AI accelerators, including the Cerebras CS-3, SambaNova SN-40, Groq, Gaudi, and TPUv5e platforms, and compare against both NVIDIA (A100, H100) and AMD (MI-300X) GPUs. We evaluate key trade-offs in latency, throughput, power consumption, and energy-efficiency across both (i) end-to-end workloads and (ii) benchmarks of individual computational primitives. Notably, we find the optimal hardware platform varies across batch size, sequence length, and model size, revealing a large underlying optimization space. Our analysis includes detailed power measurements across the prefill and decode phases of LLM inference, as well as quantification of the energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
