In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav, Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers,, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell,, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb

TL;DR
This paper evaluates a custom Tensor Processing Unit (TPU) designed for neural network inference, demonstrating significant performance and energy efficiency advantages over CPUs and GPUs in datacenter workloads.
Contribution
It provides a detailed performance and energy efficiency analysis of the TPU, highlighting its deterministic execution model and advantages over traditional hardware.
Findings
TPU is 15-30X faster than CPU and GPU for inference tasks.
TPU achieves 30X-80X higher TOPS/Watt efficiency.
Using GDDR5 memory could triple TPU performance and increase efficiency.
Abstract
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Memory and Neural Computing · Advanced Data Storage Technologies
